Who feeds the world ? And how wealthy are they ?

There are folium maps in this notebook. If they do not display or do not display well, we suggest you to have a look at the .html file that is located the same repository.

Abstract

Are countries that plant more maize richer than countries planting rice? Is it true that developed countries produce more meat? Are you better off being a food net exporter or importer? Are food prices more stable if you produce more food locally or trade more?

In this project we analyze the effects that a country agricultural sector has on its different economic indicators. The indicators of the agricultural sector we used are crops and livestock production, exports and imports of crops, livestock and live animals. For these, we use the data from the "Global Food & Agriculture Statistics" datasets. We quantify the economic success by Gross Domestic Product (GDP), but also by price stability, as defined by low changes in Consumer Price Indices (CPI). We further use the Food and Agriculture Organization (FAO) definition of food self-sufficiency to analyze its link to economic success and stability. After finding the results of the agricultural products most highly linked with economic success, we create visualizations in the form of maps. Through these timeline maps, we show how the production/export/import of important products has developed globally. We also use maps to visualize the level of food self-sufficiency and price stability.

Research questions

We would like to work on the following research questions:

  • Which agricultural products are the most produced/exported/imported globally?
  • Which of them are correlated more highly with GDP?
  • Can we predict which countries will have further GDP growth based on the repartition of their agricultural sector?
  • How can we define price stability? How can we define food self-sufficiency? Is there a link?
  • What is countries agricultural trade balance? Are countries that are net exporters or importers richer ? Are self-sufficient countries richer ?
  • How does the geographical repartition of important agricultural products look like? Which countries are net food exporters or importers? How did this evolve over the last few decades ?

External imports:

In [1]:
import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
import folium
import seaborn as sns
import json
import re
import requests
from bs4 import BeautifulSoup
from ipywidgets import interact
from IPython.display import display
import scipy.cluster.hierarchy as spc
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.linear_model import Ridge
from operator import itemgetter
from sklearn import preprocessing
from tqdm import tqdm

Setup:

In [2]:
data_folder_path = "Scripts/Data/current_FAO/raw_files/"

files = {"Crops production" : "Production_Crops_E_All_Data_(Normalized).csv",
         "Food trade" : "Trade_Crops_Livestock_E_All_Data_(Normalized).csv", 
         "Consumer price indices" : "ConsumerPriceIndices_E_All_Data_(Normalized).csv",
         "Macroeconomy" : "Macro-Statistics_Key_Indicators_E_All_Data_(Normalized).csv",
         "Livestock production" : "Production_Livestock_E_All_Data_(Normalized).csv",
         "Live animals trade" : "Trade_LiveAnimals_E_All_Data_(Normalized).csv"
        }
interesting_datasets = files.keys()

1.A. Dataset description

Our main dataset is a subset of the "Global Food & Agriculture Statistics" that is found in the proposed datasets list. In this dataset, we have seen that we could work with the production as well as import and export quantities per year and per country. As far as food is concerned, we use crops, livestock and live animals. We have also found pieces of information about countries GDP and CPI in this database.

This database countains several files. We had a look of all the files. For food-related data about countries, we decided to focus on the following files:

  • Production_Crops_E_All_Data_(Normalized).csv contains data about crops production.
  • Trade_Crops_Livestock_E_All_Data_(Normalized).csv contains data about food trade (crops and livestock).
  • Production_Livestock_E_All_Data_(Normalized).csv contains data about livestock production.
  • Trade_LiveAnimals_E_All_Data_(Normalized).csv contains data about live animals trade.

For food-related data about countries, we decided to focus on the following files:

  • ConsumerPriceIndices_E_All_Data_(Normalized).csv contains data about consumer price indices (CPI).
  • Macro-Statistics_Key_Indicators_E_All_Data_(Normalized).csv contains data about gross domestic product (GDP) along with other macroeconomic indicators.

1.B. Loading the data set

In [3]:
def load_datasets(datasets) :
    df = {}
    for dataset in datasets :
        file_path = data_folder_path + files[dataset]
        df[dataset] = pd.read_csv(file_path, encoding = "ISO-8859-1")
    return df

We load each interresting dataset in the dictionary df :

In [4]:
df = load_datasets(interesting_datasets)

1.C. Understanding the data set

In this part, we will have a first look of the datasets in order to get a first sense of the data.

In [5]:
def display_df(df, datasets):
    for dataset in datasets :
        display(dataset, df[dataset].sample(5))

In order to see what does the datasets look like, we display a sample of 5 rows for each of them :

In [6]:
display_df(df, interesting_datasets)
'Crops production'
Area Code Area Item Code Item Element Code Element Year Code Year Unit Value Flag
1155685 28 Myanmar 44 Barley 5525 Seed 1991 1991 tonnes 0.0 M
638800 70 French Polynesia 358 Cabbages and other brassicas 5510 Production 1973 1973 tonnes 688.0 NaN
2461425 5706 European Union 71 Rye 5510 Production 1995 1995 tonnes 13396237.0 A
584947 63 Estonia 103 Grain, mixed 5525 Seed 1999 1999 tonnes 3500.0 F
1326334 170 Peru 373 Spinach 5510 Production 1993 1993 tonnes 7153.0 NaN
'Food trade'
Area Code Area Item Code Item Element Code Element Year Code Year Unit Value Flag
6432420 133 Mali 892 Yoghurt, concentrated or not 5910 Export Quantity 1970 1970 tonnes NaN M
9825879 210 Sweden 245 Cake, groundnuts 5622 Import Value 1990 1990 1000 US$ 8.0 NaN
4297633 84 Greece 658 Coffee, substitutes containing coffee 5610 Import Quantity 1993 1993 tonnes 0.0 *
987570 15 Belgium-Luxembourg 1933 Meat Prepared Pres 5910 Export Quantity 1986 1986 tonnes 73739.0 A
12895215 5302 Eastern Asia 634 Beverages, distilled alcoholic 5922 Export Value 1965 1965 1000 US$ 1504.0 A
'Consumer price indices'
Area Code Area Item Code Item Months Code Months Year Code Year Unit Value Flag Note
9274 32 Cameroon 23013 Consumer Prices, Food Indices (2010 = 100) 7010 October 2015 2015 NaN 111.500000 X 2011
11460 41 China, mainland 23012 Consumer Prices, General Indices (2010 = 100) 7003 March 2011 2011 NaN 104.664138 X 2010
38108 153 New Caledonia 23013 Consumer Prices, Food Indices (2010 = 100) 7012 December 2016 2016 NaN 113.040000 X 2010
3148 52 Azerbaijan 23012 Consumer Prices, General Indices (2010 = 100) 7012 December 2012 2012 NaN 109.874245 X 2010
20605 81 Ghana 23012 Consumer Prices, General Indices (2010 = 100) 7008 August 2011 2011 NaN 111.117826 X 2010
'Macroeconomy'
Area Code Area Item Code Item Element Code Element Year Code Year Unit Value Flag
403145 191 Saint Vincent and the Grenadines 22011 Gross National Income 6109 Value Local Currency 1987 1987 millions 462.567748 XAM
350972 299 Occupied Palestinian Territory 22008 Gross Domestic Product 6155 Annual growth Local Currency 1983 1983 % 3.759361 Fc
382191 179 Qatar 22015 Gross Fixed Capital Formation 6129 Annual growth US$ 1991 1991 % -4.087014 Fc
373439 171 Philippines 22078 Value Added (Manufacture of tobacco products) 6110 Value US$ 2000 2000 millions 706.801985 Fc
9204 6 Andorra 22075 Value Added (Total Manufacturing) 6103 Share of GDP in US$ 2004 2004 % 3.678343 Fc
'Livestock production'
Area Code Area Item Code Item Element Code Element Year Code Year Unit Value Flag
23061 40 Chile 1746 Cattle and Buffaloes 5111 Stocks 1970 1970 Head 2998675.0 A
93703 177 Puerto Rico 976 Sheep 5111 Stocks 1997 1997 Head 16000.0 F
85756 162 Norway 1016 Goats 5111 Stocks 1991 1991 Head 89698.0 NaN
52282 91 Guyana 1096 Horses 5111 Stocks 1968 1968 Head 3650.0 NaN
113558 216 Thailand 1072 Geese and guinea fowls 5112 Stocks 1980 1980 1000 Head 195.0 NaN
'Live animals trade'
Area Code Area Item Code Item Element Code Element Year Code Year Unit Value Flag
129285 49 Cuba 1079 Turkeys 5609 Import Quantity 1995 1995 1000 Head NaN M
573576 5207 South America 1126 Camels 5622 Import Value 2011 2011 1000 US$ NaN A
346770 164 Pacific Islands Trust Territory 10 Total Merchandise Trade 5622 Import Value 1983 1983 1000 US$ 45000.0 F
111278 214 China, Taiwan Province of 1034 Pigs 5608 Import Quantity 1993 1993 Head 179.0 NaN
517109 261 EU(12)ex.int 976 Sheep 5922 Export Value 2006 2006 1000 US$ 4632.0 A

At first glance, our datasets seem very clean.

Each of our dataset contains a column "Year" and a column that is named "Area". This is a great news for us since we want to do a both geographical and time-related analysis.

The column "Area" correspond to the country except it may contains a group of country (e.g. "Eastern Europe").

1.D. Cleansing the data set

In this part, we will clean the datasets. The final goal is to produce one uniformized dataset on which we could work (see 1.F.).

In a very simplistic way, such a cleaned and uniformized dataset may look like this :

Country | Year | GDP | CPI | Food production features | Food trade features

1.D.a. Removing unuseful data

In this section, we will create dataframes in df_useful which correspond to previous dataframes without the unuseful data.

In [7]:
df_useful = {}
1.D.a.i. Extracting GDP from the "Macroeconomy" dataset

The "Macroeconomy" dataset contains many different measures: Gross Fixed Capital Formation, Gross National Income, Value Added (Total Manufacturing), ... We are only interested in Gross Domestic Product. Therefore, we extract it Gross Domestic Product from the "Macroeconomy" dataset. In order to have uniformisation among values, we choose the US$ value. All of them have the same unit (millions US\\$) so we can drop the "Unit" column as well.

In [8]:
def extract_GDP(df):
    def selection_GDP(df):
        return df['Item']=='Gross Domestic Product'
    def selection_US_dollars(df):
        return df['Element']=="Value US$"
    def drop_columns(df):
        dropped_colmuns = ["Item Code", "Item", "Element Code", "Element", "Flag", "Year Code", "Unit"]
        return df.drop(columns = dropped_colmuns)
    return drop_columns(df[selection_GDP(df)&selection_US_dollars(df)])
In [9]:
df_useful["GDP"] = extract_GDP(df["Macroeconomy"])

We can have have a look at a sample of the extrated dataset:

In [10]:
display(df_useful["GDP"].sample(5))
Area Code Area Year Value
5131 4 Algeria 1979 33201.433009
524314 248 Yugoslav SFR 1985 73311.647653
47 2 Afghanistan 1971 1831.108982
260981 121 Lebanon 2003 19801.658375
79039 35 Cabo Verde 2004 1051.195104

And we can plot GDP in million US$ for different countries for the period 1970-2015:

In [11]:
select_switzerland = df_useful["GDP"]['Area']=='Switzerland'
select_france = df_useful["GDP"]['Area']=='France'
select_austria = df_useful["GDP"]['Area']=='Austria'
select_canada = df_useful["GDP"]['Area']=='Canada'
ax = df_useful["GDP"][select_switzerland].plot(x ='Year', y='Value', kind = 'line')
ax = df_useful["GDP"][select_france].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = df_useful["GDP"][select_austria].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = df_useful["GDP"][select_canada].plot(x ='Year', y='Value', kind = 'line', ax = ax)
_ = ax.legend(["Switzerland", 'France', 'Austria', "Canada"])
_ = ax.set_title('GDP in million US$ for different countries for the period 1970-2015')

For dissolute or new countries, we have some Nan values (before appearing or after dissolution) as in this next example :

In [12]:
select_USSR = df_useful["GDP"]['Area']=='USSR'
select_russia = df_useful["GDP"]['Area']=='Russian Federation'
select_ukraine = df_useful["GDP"]['Area']=='Ukraine'
ax = df_useful["GDP"][select_USSR].plot(x ='Year', y='Value', kind = 'line')
ax = df_useful["GDP"][select_russia].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = df_useful["GDP"][select_ukraine].plot(x ='Year', y='Value', kind = 'line', ax = ax)
_ = ax.legend(["USSR", 'Russia', 'Ukraine'])
_ = ax.set_title('GDP in million US$ for different countries for the period 1970-2015')
1.D.a.ii. Extracting crops harvested area, production, seed and yield from the "Crops production" dataset

We want to extract crops harvested area, production, seed and yield from the "Crops production" dataset. As all crops are not food crops, we request the World crops database to keep only the food crops.

In [13]:
def get_food_crops():
    #Return a list of crops categorized as food crops https://world-crops.com/food-crops/
    url="https://world-crops.com/food-crops/"
    r=requests.get(url,headers={"User-Agent": "XY"})
    soup=BeautifulSoup(r.text,'html.parser')
    elements_temp=soup.find_all('a',href=re.compile("^../"))
    elements=[el.text for el in elements_temp]
    
    #only 40 elements are displayed on each page->iterating on the total list
    for i in range(40,401,40):
        url_i=url+"?ss="+str(i)
        r=requests.get(url_i,headers={"User-Agent":"XY"})
        soup=BeautifulSoup(r.text,'html.parser')
        new_elements=soup.find_all('a',href=re.compile("^../"))
        elements+=[el.text for el in new_elements]
    return elements

def inclusive_search(string,elements):
    #returns true if the string can be found in elements. The search removes special characters from string in order to include more positive results
    string=string.lower()
    delimiters = ",", "(","&",")"," and "," "
    pattern = '|'.join(map(re.escape, delimiters))
    strings=list(filter(None,re.split(pattern,string)))
    found=False
    for s in strings:
        if s=="nes":
            continue
        for el in elements:
            found=(s in el.split())
            if found==False and s[-1]=="s":
                found=s[:-1] in el.split()
            if found==False and s[-2:]=="es":
                found=s[:-2] in el.split()
            if found==False and s[-3:]=="ies":
                found=s[:-3]+"y" in el.split()
            if found==True:
                return found
    return found


def get_food_crop_data(df):    
    #extracts the food crop data, returns 4 df: Area,Production,Seed and yield    
    df=df.copy()
    food_crops=list(map(lambda x: x.lower(),get_food_crops()))              
    crop_types_df=df[['Item','Value']].groupby('Item').sum()
    crop_types_df=crop_types_df[list(map(lambda x : inclusive_search(x,food_crops) , crop_types_df.index ))]   
    food_crop_df=df[df.Item.apply(lambda x: x in crop_types_df.index)]
    return (food_crop_df[food_crop_df.Element=='Area harvested'],
            food_crop_df[food_crop_df.Element=='Production'],
            food_crop_df[food_crop_df.Element=='Seed'],
            food_crop_df[food_crop_df.Element=='Yield'])
  
food_crop_area_df , food_crop_production_df , food_crop_seed_df , food_crop_yield_df = get_food_crop_data(df["Crops production"])
In [14]:
df_useful['Crops Area harvested'] = food_crop_area_df.drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
df_useful['Crops Production'] = food_crop_production_df.drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
df_useful['Crops Seed'] = food_crop_seed_df.drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
df_useful['Crops Yield'] =  food_crop_yield_df.drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])

We check everything is fine by looking at samples for each of the new dataframes:

In [15]:
display('Crops Area harvested', df_useful['Crops Area harvested'].sample(5))
display('Crops Production', df_useful['Crops Production'].sample(5))
display('Crops Seed', df_useful['Crops Seed'].sample(5))
display('Crops Yield', df_useful['Crops Yield'].sample(5))
'Crops Area harvested'
Area Code Area Item Year Unit Value
1563166 38 Sri Lanka Beans, dry 2002 ha 11250.0
734432 90 Guinea Nuts, nes 1990 ha NaN
1808422 235 Uzbekistan Chillies and peppers, dry 2010 ha 1600.0
2282951 5400 Europe Castor oil seed 1982 ha 168256.0
478194 51 Czechoslovakia Beans, green 1975 ha 1160.0
'Crops Production'
Area Code Area Item Year Unit Value
2553686 5815 Low Income Food Deficit Countries Mustard seed 1963 tonnes 70680.0
1152391 144 Mozambique Tomatoes 1996 tonnes 9500.0
1095081 137 Mauritius Roots and Tubers,Total 2007 tonnes 16779.0
2424543 5502 Melanesia Pineapples 1964 tonnes 6050.0
1187247 150 Netherlands Eggplants (aubergines) 2008 tonnes 43000.0
'Crops Seed'
Area Code Area Item Year Unit Value
1350962 173 Poland Lupins 1985 tonnes 17320.0
239099 233 Burkina Faso Fonio 2011 tonnes 966.0
2103275 5204 Central America Sweet potatoes 2014 tonnes 3650.0
122654 16 Bangladesh Sorghum 1980 tonnes 39.0
2572695 5817 Net Food Importing Developing Countries Cassava 1973 tonnes 80.0
'Crops Yield'
Area Code Area Item Year Unit Value
1759493 215 United Republic of Tanzania Fruit, tropical fresh nes 1965 hg/ha 25556.0
1529521 202 South Africa Carrots and turnips 1985 hg/ha 260000.0
411561 46 Congo Potatoes 1972 hg/ha 69231.0
1495790 197 Sierra Leone Groundnuts, with shell 1981 hg/ha 5263.0
1533616 202 South Africa Mangoes, mangosteens, guavas 1981 hg/ha 94887.0

We also make some plots to have a first understanding of the dataset:

In [16]:
select_Maize = df_useful['Crops Area harvested']['Item']=='Maize'
maize_df = df_useful['Crops Area harvested'][select_Maize]

select_switzerland = maize_df['Area']=='Switzerland'
select_france = maize_df['Area']=='France'
select_austria = maize_df['Area']=='Austria'
select_canada = maize_df['Area']=='Canada'
ax = maize_df[select_switzerland].plot(x ='Year', y='Value', kind = 'line')
ax = maize_df[select_france].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = maize_df[select_austria].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = maize_df[select_canada].plot(x ='Year', y='Value', kind = 'line', ax = ax)
_ = ax.legend(["Switzerland", 'France', 'Austria', "Canada"])
_ = ax.set_title('Crops Area harvested in ha for different countries for the period 1970-2015')
In [17]:
select_USSR = maize_df['Area']=='USSR'
select_russia = maize_df['Area']=='Russian Federation'
select_ukraine = maize_df['Area']=='Ukraine'
ax = maize_df[select_USSR].plot(x ='Year', y='Value', kind = 'line')
ax = maize_df[select_russia].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = maize_df[select_ukraine].plot(x ='Year', y='Value', kind = 'line', ax = ax)
_ = ax.legend(["USSR", 'Russia', 'Ukraine'])
_ = ax.set_title('Crops Area harvested in ha for different countries for the period 1970-2015')
1.D.a.iii. Extracting stocks production from the "Livestock production" dataset

We want to extract stocks production from the "Livestock production" dataset. Again, we drop the columns that are useless for us and have a first look of the data with a sample and some plots.

In [18]:
selection_stocks = df['Livestock production']["Element"] == 'Stocks'
df_useful['Livestock production'] = df['Livestock production'][selection_stocks].drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
In [19]:
display(df_useful['Livestock production'].sample(5))
Area Code Area Item Year Unit Value
115173 217 Togo Pigs 1994 Head 420600.0
72161 134 Malta Cattle 1981 Head 11385.0
57204 102 Iran (Islamic Republic of) Cattle 1995 Head 8347000.0
121209 215 United Republic of Tanzania Asses 1978 Head 162000.0
4377 9 Argentina Sheep 1963 Head 46157904.0
In [20]:
select_pigs = df_useful['Livestock production']['Item']=='Pigs'
pigs_df = df_useful['Livestock production'][select_pigs]

select_switzerland = pigs_df['Area']=='Switzerland'
select_france = pigs_df['Area']=='France'
select_austria = pigs_df['Area']=='Austria'
select_canada = pigs_df['Area']=='Canada'
ax = pigs_df[select_switzerland].plot(x ='Year', y='Value', kind = 'line')
ax = pigs_df[select_france].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = pigs_df[select_austria].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = pigs_df[select_canada].plot(x ='Year', y='Value', kind = 'line', ax = ax)
_ = ax.legend(["Switzerland", 'France', 'Austria', "Canada"])
_ = ax.set_title('Pigs production in heads for different countries for the period 1970-2015')
In [21]:
select_USSR = pigs_df['Area']=='USSR'
select_russia = pigs_df['Area']=='Russian Federation'
select_ukraine = pigs_df['Area']=='Ukraine'
ax = pigs_df[select_USSR].plot(x ='Year', y='Value', kind = 'line')
ax = pigs_df[select_russia].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = pigs_df[select_ukraine].plot(x ='Year', y='Value', kind = 'line', ax = ax)
_ = ax.legend(["USSR", 'Russia', 'Ukraine'])
_ = ax.set_title('Pigs production in heads for different countries for the period 1970-2015')
1.D.a.iv. Extracting import and export quantities from the "Live animals trade" and "Crops trade" datasets

Now, we extract import and export quantities from the "Live animals trade" and "Crops trade" datasets, having again some samples and some plots.

In [22]:
selection_import_quantities = df['Live animals trade']["Element"] == 'Import Quantity'
selection_export_quantities = df['Live animals trade']["Element"] == 'Export Quantity'

df_useful['Live animals import quantities'] = df['Live animals trade'][selection_import_quantities].drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
df_useful['Live animals export quantities'] = df['Live animals trade'][selection_export_quantities].drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
In [23]:
display(df_useful['Live animals import quantities'].sample(5))
Area Code Area Item Year Unit Value
267653 122 Lesotho Pigs 1992 Head 49.0
401251 194 Saudi Arabia Cattle 1968 Head 71678.0
297393 138 Mexico Beehives 1997 No 3.0
470238 223 Turkey Ducks 2007 1000 Head 2.0
67318 26 Brunei Darussalam Bovine, Animals 1970 Head 2234.0
In [24]:
select_pigs = df_useful['Live animals import quantities']['Item']=='Pigs'
pigs_df = df_useful['Live animals import quantities'][select_pigs]

select_switzerland = pigs_df['Area']=='Switzerland'
select_france = pigs_df['Area']=='France'
select_austria = pigs_df['Area']=='Austria'
select_canada = pigs_df['Area']=='Canada'
ax = pigs_df[select_switzerland].plot(x ='Year', y='Value', kind = 'line')
ax = pigs_df[select_france].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = pigs_df[select_austria].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = pigs_df[select_canada].plot(x ='Year', y='Value', kind = 'line', ax = ax)
_ = ax.legend(["Switzerland", 'France', 'Austria', "Canada"])
_ = ax.set_title('Pigs importation in heads for different countries for the period 1970-2015')
In [25]:
select_USSR = pigs_df['Area']=='USSR'
select_russia = pigs_df['Area']=='Russian Federation'
select_ukraine = pigs_df['Area']=='Ukraine'
ax = pigs_df[select_USSR].plot(x ='Year', y='Value', kind = 'line')
ax = pigs_df[select_russia].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = pigs_df[select_ukraine].plot(x ='Year', y='Value', kind = 'line', ax = ax)
_ = ax.legend(["USSR", 'Russia', 'Ukraine'])
_ = ax.set_title('Pigs importation in head for different countries for the period 1970-2015')
In [26]:
display(df_useful['Live animals export quantities'].sample(5))
Area Code Area Item Year Unit Value
367540 173 Poland Turkeys 1972 1000 Head NaN
345059 221 Oman Goats 1982 Head 14000.0
549843 5104 Southern Africa Turkeys 1969 1000 Head 0.0
39711 57 Belarus Bovine, Animals 1993 Head 3200.0
252702 114 Kenya Goats 1968 Head 20000.0
In [27]:
select_pigs = df_useful['Live animals export quantities']['Item']=='Pigs'
pigs_df = df_useful['Live animals export quantities'][select_pigs]

select_switzerland = pigs_df['Area']=='Switzerland'
select_france = pigs_df['Area']=='France'
select_austria = pigs_df['Area']=='Austria'
select_canada = pigs_df['Area']=='Canada'
ax = pigs_df[select_switzerland].plot(x ='Year', y='Value', kind = 'line')
ax = pigs_df[select_france].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = pigs_df[select_austria].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = pigs_df[select_canada].plot(x ='Year', y='Value', kind = 'line', ax = ax)
_ = ax.legend(["Switzerland", 'France', 'Austria', "Canada"])
_ = ax.set_title('Pigs exportation in heads for different countries for the period 1970-2015')
In [28]:
select_USSR = pigs_df['Area']=='USSR'
select_russia = pigs_df['Area']=='Russian Federation'
select_ukraine = pigs_df['Area']=='Ukraine'
ax = pigs_df[select_USSR].plot(x ='Year', y='Value', kind = 'line')
ax = pigs_df[select_russia].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = pigs_df[select_ukraine].plot(x ='Year', y='Value', kind = 'line', ax = ax)
_ = ax.legend(["USSR", 'Russia', 'Ukraine'])
_ = ax.set_title('Pigs exportation in heads for different countries for the period 1970-2015')
In [29]:
selection_import_quantities = df['Food trade']["Element"] == 'Import Quantity'
selection_export_quantities = df['Food trade']["Element"] == 'Export Quantity'

df_useful['Food import quantities'] = df['Food trade'][selection_import_quantities].drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
df_useful['Food export quantities'] = df['Food trade'][selection_export_quantities].drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
In [30]:
display(df_useful['Food import quantities'].sample(5))
Area Code Area Item Year Unit Value
3215241 54 Denmark Tobacco 1964 tonnes 16656.0
5200015 104 Ireland Meat, goat 1997 tonnes 0.0
10249398 217 Togo Natural Rubber 1987 tonnes 1.0
1991525 37 Central African Republic Buttermilk, curdled, acidified milk 1994 tonnes 1.0
8442987 117 Republic of Korea Oil, rapeseed 1970 tonnes 70.0
In [31]:
select_Maize = df_useful['Food import quantities']['Item']=='Maize'
maize_df = df_useful['Food import quantities'][select_Maize]

select_switzerland = maize_df['Area']=='Switzerland'
select_france = maize_df['Area']=='France'
select_austria = maize_df['Area']=='Austria'
select_canada = maize_df['Area']=='Canada'
ax = maize_df[select_switzerland].plot(x ='Year', y='Value', kind = 'line')
ax = maize_df[select_france].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = maize_df[select_austria].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = maize_df[select_canada].plot(x ='Year', y='Value', kind = 'line', ax = ax)
_ = ax.legend(["Switzerland", 'France', 'Austria', "Canada"])
_ = ax.set_title('Maize importation in tonnes for different countries for the period 1970-2015')
In [32]:
select_USSR = maize_df['Area']=='USSR'
select_russia = maize_df['Area']=='Russian Federation'
select_ukraine = maize_df['Area']=='Ukraine'
ax = maize_df[select_USSR].plot(x ='Year', y='Value', kind = 'line')
ax = maize_df[select_russia].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = maize_df[select_ukraine].plot(x ='Year', y='Value', kind = 'line', ax = ax)
_ = ax.legend(["USSR", 'Russia', 'Ukraine'])
_ = ax.set_title('Maize importation in tonnes for different countries for the period 1970-2015')
In [33]:
display(df_useful['Food export quantities'].sample(5))
Area Code Area Item Year Unit Value
7778436 165 Pakistan Glucose and dextrose 1986 tonnes 185.0
10998881 234 Uruguay Coconuts, desiccated 1989 tonnes NaN
11047297 234 Uruguay Wheat 1963 tonnes 48657.0
4065106 75 Gambia Meat, cattle, boneless (beef & veal) 1966 tonnes NaN
11609368 268 EU(25)ex.int Cake, cottonseed 1993 tonnes 10.0
In [34]:
select_Maize = df_useful['Food export quantities']['Item']=='Maize'
maize_df = df_useful['Food export quantities'][select_Maize]

select_switzerland = maize_df['Area']=='Switzerland'
select_france = maize_df['Area']=='France'
select_austria = maize_df['Area']=='Austria'
select_canada = maize_df['Area']=='Canada'
ax = maize_df[select_switzerland].plot(x ='Year', y='Value', kind = 'line')
ax = maize_df[select_france].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = maize_df[select_austria].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = maize_df[select_canada].plot(x ='Year', y='Value', kind = 'line', ax = ax)
_ = ax.legend(["Switzerland", 'France', 'Austria', "Canada"])
_ = ax.set_title('Maize exportation in tonnes for different countries for the period 1970-2015')
In [35]:
select_USSR = maize_df['Area']=='USSR'
select_russia = maize_df['Area']=='Russian Federation'
select_ukraine = maize_df['Area']=='Ukraine'
ax = maize_df[select_USSR].plot(x ='Year', y='Value', kind = 'line')
ax = maize_df[select_russia].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = maize_df[select_ukraine].plot(x ='Year', y='Value', kind = 'line', ax = ax)
_ = ax.legend(["USSR", 'Russia', 'Ukraine'])
_ = ax.set_title('Maize exportation in tonnes for different countries for the period 1970-2015')
1.D.a.v. Extracting average CPI of each year from the "Consumer price indices" dataset

The "Consumer price indices" dataset contains monthly data. In order to have a uniform dataframe, and as other dataframes have yearly data, we group it by (Country, Year) and compute the mean over the monthes.

In [36]:
df_useful['Consumer price indices'] =  df['Consumer price indices'][['Area',"Year",'Value']] \
                                        .dropna() \
                                        .groupby(['Area',"Year"]) \
                                        .mean() \
                                        .reset_index() \
                                        .dropna()

With samples and plots, we remark that this dataset only start in 2000 wheareas other ones start in 1970.

In [37]:
display(df_useful['Consumer price indices'].sample(5))
Area Year Value
3088 Trinidad and Tobago 2003 83.338459
3079 Tonga 2012 106.874292
2633 Sao Tome and Principe 2011 114.320902
2401 Portugal 2003 88.730413
408 Brazil 2000 50.709952
In [38]:
select_switzerland = df_useful['Consumer price indices']['Area']=='Switzerland'
select_france = df_useful['Consumer price indices']['Area']=='France'
select_austria = df_useful['Consumer price indices']['Area']=='Austria'
select_canada = df_useful['Consumer price indices']['Area']=='Canada'
ax = df_useful['Consumer price indices'][select_switzerland].plot(x ='Year', y='Value', kind = 'line')
ax = df_useful['Consumer price indices'][select_france].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = df_useful['Consumer price indices'][select_austria].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = df_useful['Consumer price indices'][select_canada].plot(x ='Year', y='Value', kind = 'line', ax = ax)
_ = ax.legend(["Switzerland", 'France', 'Austria', "Canada"])
_ = ax.set_title('Consumer price indices in % for different countries for the period 1970-2015')
In [39]:
select_russia = df_useful["Consumer price indices"]['Area']=='Russian Federation'
select_ukraine = df_useful["Consumer price indices"]['Area']=='Ukraine'
ax = df_useful["Consumer price indices"][select_russia].plot(x ='Year', y='Value', kind = 'line')
ax = df_useful["Consumer price indices"][select_ukraine].plot(x ='Year', y='Value', kind = 'line', ax = ax)
_ = ax.legend(['Russia', 'Ukraine'])
_ = ax.set_title('Consumer price indices in % for different countries for the period 1970-2015')
1.D.a.vi. Removing areas which are not countries

Having a more detailled look at the dataset, we have remarked that the areas which are real countries are exactely the ones with an "Area Code" below $5000$ but not in $[261, 269]$.

In [40]:
#remove Area code >= 5000 or in [261, 269] (EU)
for df_name in df_useful :
    if 'Area Code' in df_useful[df_name].keys() : 
        print ("Removing areas which are not countries in", df_name)
        selection_countries = df_useful[df_name]['Area Code']<261 
        selection_countries = selection_countries | (df_useful[df_name]['Area Code']>269)
        selection_countries = selection_countries & (df_useful[df_name]['Area Code']<5000)
        df_useful[df_name] = df_useful[df_name][selection_countries]
        display(df_useful[df_name].sample(5))
    else :
        print (df_name, "is already clean")
Removing areas which are not countries in GDP
Area Code Area Year Value
454232 210 Sweden 1982 112767.695010
475539 220 Trinidad and Tobago 2010 22157.966489
86616 33 Canada 1991 610319.305240
485263 224 Turks and Caicos Islands 2012 715.722768
66412 239 British Virgin Islands 2013 915.601000
Removing areas which are not countries in Crops Area harvested
Area Code Area Item Year Unit Value
1002786 124 Libya Cabbages and other brassicas 2014 ha 306.0
809892 100 India Roots and Tubers,Total 1974 ha 1130600.0
684821 84 Greece Carrots and turnips 1966 ha 500.0
1852585 249 Yemen Roots and Tubers,Total 2013 ha 22572.0
652249 75 Gambia Rice, paddy 1963 ha 25495.0
Removing areas which are not countries in Crops Production
Area Code Area Item Year Unit Value
1683218 220 Trinidad and Tobago Oranges 1996 tonnes 7140.0
47269 8 Antigua and Barbuda Pineapples 1989 tonnes 130.0
774490 95 Honduras Fruit excl Melons,Total 1965 tonnes 859263.0
830790 102 Iran (Islamic Republic of) Millet 1974 tonnes 5000.0
81273 10 Australia Soybeans 2005 tonnes 54000.0
Removing areas which are not countries in Crops Seed
Area Code Area Item Year Unit Value
1022596 129 Madagascar Cow peas, dry 1998 tonnes 141.0
856410 104 Ireland Rye 1979 tonnes 26.0
149383 23 Belize Groundnuts, with shell 2003 tonnes 4.0
1404416 182 Réunion Beans, dry 1975 tonnes 30.0
1725664 226 Uganda Beans, dry 1975 tonnes 21745.0
Removing areas which are not countries in Crops Yield
Area Code Area Item Year Unit Value
975666 119 Latvia Garlic 2003 hg/ha 34027.0
239428 233 Burkina Faso Grapefruit (inc. pomelos) 2014 hg/ha 150306.0
1071034 134 Malta Grapes 1979 hg/ha 56250.0
674529 81 Ghana Fruit, fresh nes 2007 hg/ha 38259.0
1096427 138 Mexico Asparagus 1979 hg/ha 51334.0
Removing areas which are not countries in Livestock production
Area Code Area Item Year Unit Value
25190 128 China, Macao SAR Poultry Birds 2013 1000 Head 730.0
114957 217 Togo Cattle 1994 Head 227100.0
14148 20 Botswana Horses 1964 Head 8664.0
86326 299 Occupied Palestinian Territory Chickens 1967 1000 Head NaN
42458 68 France Goats 1973 Head 898700.0
Removing areas which are not countries in Live animals import quantities
Area Code Area Item Year Unit Value
441071 210 Sweden Beehives 1988 No NaN
261737 119 Latvia Animals live nes 2011 Head 0.0
66251 26 Brunei Darussalam Horses 1963 Head 0.0
85710 33 Canada Mules 1971 Head NaN
198293 89 Guatemala Buffaloes 1967 Head NaN
Removing areas which are not countries in Live animals export quantities
Area Code Area Item Year Unit Value
333744 158 Niger Pigs 2009 Head 0.0
188931 81 Ghana Mules 1986 Head NaN
21856 10 Australia Pigs 2003 Head 556.0
32330 13 Bahrain Rabbits and hares 1961 1000 Head NaN
437022 207 Suriname Bovine, Animals 1967 Head 0.0
Removing areas which are not countries in Food import quantities
Area Code Area Item Year Unit Value
592668 52 Azerbaijan Juice, pineapple, concentrated 1992 tonnes NaN
8083077 171 Philippines Alfalfa meal and pellets 1983 tonnes 0.0
8734556 189 Saint Lucia Dates 2009 tonnes 2.0
2301812 128 China, Macao SAR Flour, roots and tubers nes 1966 tonnes 0.0
1385251 21 Brazil Cotton, carded, combed 1987 tonnes 49.0
Removing areas which are not countries in Food export quantities
Area Code Area Item Year Unit Value
11078558 228 USSR Wool, degreased 1980 tonnes 1574.0
5025761 101 Indonesia Pistachios 2007 tonnes 1.0
3872677 68 France Coffee, green 1980 tonnes 12266.0
1569190 27 Bulgaria Offals, liver geese 1963 tonnes NaN
7364045 157 Nicaragua Flour, maize 1991 tonnes 0.0
Consumer price indices is already clean

1.D.b. Handling of the missing data

In this section, we will explain how we will handle the missing data in previous dataframes for maps.

1.D.b.i. Highlighting the problem
In [41]:
select_USSR = df_useful["GDP"]['Area']=='USSR'
select_russia = df_useful["GDP"]['Area']=='Russian Federation'
select_ukraine = df_useful["GDP"]['Area']=='Ukraine'
ax = df_useful["GDP"][select_USSR].plot(x ='Year', y='Value', kind = 'line')
ax = df_useful["GDP"][select_russia].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = df_useful["GDP"][select_ukraine].plot(x ='Year', y='Value', kind = 'line', ax = ax)
_ = ax.legend(["USSR", 'Russia', 'Ukraine'])
_ = ax.set_title('GDP in million US$ for different countries for the period 1970-2015')

In order to vizualize folium maps, we need to associate each country a value. The geojson file that we use is not timestamped and only countries that exist nowadays are inside it. As some countries has been dissolved during the past 50 years, our folium maps won't be complete. For instance, we do not have any value for Ukraine from 1970 to 1989. Our idea to fix this issue is presented in the next paragraph.

1.D.b.ii. Proposed correction

Our idea is to map the former country value to each of the current ones. For instance in 1982, USSR GDP is around one trillion \$. Therefore, if we associate (only for folium map purposes) this value to each current country that succeeded USSR, all these countries will appear the same color in the folium map, i.e. all the USSR area will appear the same color (and the good one).

In order to do so, one need to identify which countries appeared and disappeared from the dataset and at which year. Then we will use this result along with some historical research in our visualise_world_data_folium function (1.E.a.).

In [42]:
countries_formation_years = {}
for country in df_useful["GDP"]["Area"].unique():
    selection = df_useful["GDP"]["Area"] == country
    year_in, year_out = df_useful["GDP"][selection].dropna()["Year"].min(), df_useful["GDP"][selection].dropna()["Year"].max()
    for year in (year_in, year_out):
        if year not in countries_formation_years :
            countries_formation_years[year] = []
    countries_formation_years[year_in].append((country,'+'))
    countries_formation_years[year_out].append((country,'-'))

countries_formation_years.pop(1970)
countries_formation_years.pop(2015)
for year in sorted(list(countries_formation_years)):
    print (year, countries_formation_years[year])
1988 [('Yemen Ar Rp', '-'), ('Yemen Dem', '-')]
1989 [('Czechoslovakia', '-'), ('Ethiopia PDR', '-'), ('USSR', '-'), ('Yemen', '+'), ('Yugoslav SFR', '-')]
1990 [('Armenia', '+'), ('Azerbaijan', '+'), ('Belarus', '+'), ('Bosnia and Herzegovina', '+'), ('Croatia', '+'), ('Czechia', '+'), ('Eritrea', '+'), ('Estonia', '+'), ('Ethiopia', '+'), ('Georgia', '+'), ('Kazakhstan', '+'), ('Kyrgyzstan', '+'), ('Latvia', '+'), ('Lithuania', '+'), ('Montenegro', '+'), ('Republic of Moldova', '+'), ('Russian Federation', '+'), ('Serbia', '+'), ('Slovakia', '+'), ('Slovenia', '+'), ('Tajikistan', '+'), ('The former Yugoslav Republic of Macedonia', '+'), ('Timor-Leste', '+'), ('Turkmenistan', '+'), ('Ukraine', '+'), ('Uzbekistan', '+')]
1999 [('Kosovo', '+')]
2005 [('Curaçao', '+'), ('Sint Maarten (Dutch Part)', '+')]
2007 [('Sudan (former)', '-')]
2008 [('South Sudan', '+'), ('Sudan', '+')]
2012 [('Netherlands Antilles (former)', '-')]

1.E. Preprocessing the data set

In this part, we will finish prepocessing the datasets. More precisely, we will deal with country names and normalizing the features.

      1. Converting country names between different naming conventions

      2. Normalization and log scales

1.E.a. Converting country names between different naming conventions

Some countries have different names in the geojson file and in the dataset. We first start by correcting them.

In [43]:
def correct_country_names(old_name, dic):
    if old_name in dic.keys() :
        return dic[old_name]
    return old_name
In [44]:
dic = {'Czechia': "Czech Republic",
       'Russian Federation':'Russia',
       "Serbia":"Republic of Serbia",
       'The former Yugoslav Republic of Macedonia':'Macedonia',
       'China, mainland':'China',
       'Viet Nam':'Vietnam',
       'Venezuela (Bolivarian Republic of)':'Venezuela',
       'Iran (Islamic Republic of)':'Iran',
       'Syrian Arab Republic':"Syria",
       'Bolivia (Plurinational State of)': 'Bolivia',
       "Côte d'Ivoire": "Ivory Coast",
       'Congo':"Republic of the Congo",
       "Lao People's Democratic Republic":'Laos',
       "Democratic People's Republic of Korea":"North Korea",
       'Republic of Korea':"South Korea"}

for df_name in df_useful :
    print (df_name)
    df_useful[df_name]["Area"] = df_useful[df_name]["Area"].apply(lambda x : correct_country_names(x,dic))
GDP
Crops Area harvested
Crops Production
Crops Seed
Crops Yield
Livestock production
Live animals import quantities
Live animals export quantities
Food import quantities
Food export quantities
Consumer price indices

Then, we do a function that takes as input a dataframe and a year and produces the corresponding folium map. This function also handles dissolutions of countries as suggested before.

In [45]:
def visualise_world_data_folium(df, year, logScale=True):
    dic = {'USSR':                            ['Armenia', 'Azerbaijan','Belarus', 'Estonia', 'Georgia',
                                               'Kazakhstan', 'Kyrgyzstan', 'Latvia', 'Lithuania',
                                               'Montenegro', 'Republic of Moldova', 'Russia',
                                               'Republic of Serbia', 'Timor-Leste', 'Turkmenistan', 'Ukraine',
                                               'Uzbekistan'],
           'Ethiopia PDR':                     ['Eritrea','Ethiopia'],
           'Yugoslav SFR':                     ['Kosovo', 'Slovenia', 'Croatia',
                                                'Macedonia', 'Bosnia and Herzegovina'],
           'Yemen Dem' :                       ['Yemen'],        
           'Czechoslovakia':                   ["Czech Republic", 'Slovakia'],
           'Netherlands Antilles (former)':    ['Curaçao', 'Sint Maarten (Dutch Part)'],
           'Sudan (former)':                   ['South Sudan', 'Sudan']
          }
    to_plot=df[df["Year"]==year]
    to_plot=(to_plot[['Area','Value']]
             .dropna()
             .groupby('Area')             
             .mean()
             .reset_index()
             .dropna()) 
    to_plot['Area']=to_plot['Area'].apply(lambda x : correct_country_names(x, dic))
    to_plot = to_plot.explode('Area')
    if logScale :
        to_plot.Value=np.log10(1+to_plot.Value)
    
    m = folium.Map(location=[40,-10],zoom_start=1.6)
    folium.Choropleth(
        geo_data=f"https://raw.githubusercontent.com/python-visualization/folium/master/examples/data/world-countries.json",
        data=to_plot,
        columns=['Area', 'Value'],
        key_on='feature.properties.name',
        fill_color='YlGn',fill_opacity=0.7,line_opacity=0.2,nan_fill_opacity=0.0
    ).add_to(m)

    folium.LayerControl().add_to(m)

    return(m)

We can know use it to produce some maps. For instance, we plot below the map of GDP for the year 1985 (before dissolution of USSR) and 1995 (after).

In [46]:
display(visualise_world_data_folium(df_useful["GDP"], 1985, True))
display(visualise_world_data_folium(df_useful["GDP"], 1995, True))

1.E.b. Normalization and log scales

Some of our features seem to be right skewed. At first glance it seems that they look like power laws.

For instance the distribution of GDP look a bit like a power law:

In [47]:
_ = sns.distplot(df_useful["GDP"]["Value"], rug=False, hist=False)

As we later want to train some Machine Learning models, we log those values so that their distribution look a bit more like a normal distribution.

In [48]:
#looks better with log scale
_ = sns.distplot(np.log(df_useful["GDP"]["Value"]), rug=False, hist=False)

The new distribution indeed looks better to train models on it.

1.F. Making one uniformized dataframe

In this part, we will make one uniformized dataframe uni_df with the following columns.

Country | Year | GDP | Crops production columns | Livestock production columns | Crops importation columns | Livestock importation columns | Crops exportation columns | Livestock exportation columns | CPI

In this uniformized dataframe, a tuple (Country, Year) uniquely identifies a row.

1.F.a. Pivoting dataframes with items

The current dataframes have several rows for a given (Country, Year). Each of this row correspond to one item. We would like to have a unique row for a given (Country, Year) and one column per item:

In [49]:
need_pivot = ['Crops Area harvested',
              'Crops Production',
              'Crops Seed',
              'Crops Yield',
              'Livestock production',
              'Live animals import quantities',
              'Live animals export quantities',
              'Food import quantities',
              'Food export quantities']

def rename_columns(x, word):
    if x not in ['Area', 'Year', 'ha', 'tonnes', 'hg/ha', 'Head', '1000 Head']:
        return x + ' ' + word
    return x

df_useful['GDP'] = df_useful['GDP'].rename(columns = {'Value':'(GDP, million $)'})[["Area",'Year','(GDP, million $)']]
df_useful['Consumer price indices'] = df_useful['Consumer price indices'].rename(columns = {'Value':'(Consumer price indices, %)'})[["Area",'Year','(Consumer price indices, %)']]

for df_name in need_pivot :
    df_useful[df_name] = pd.pivot_table(df_useful[df_name], index=["Area",'Year'], columns=["Item","Unit"], values="Value").rename(columns=lambda x: rename_columns(x, df_name))
    display(df_name, df_useful[df_name].sample(5))
'Crops Area harvested'
Item Anise, badian, fennel, coriander Crops Area harvested Apples Crops Area harvested Apricots Crops Area harvested Areca nuts Crops Area harvested Artichokes Crops Area harvested Asparagus Crops Area harvested Avocados Crops Area harvested Bambara beans Crops Area harvested Bananas Crops Area harvested Barley Crops Area harvested ... Sweet potatoes Crops Area harvested Tangerines, mandarins, clementines, satsumas Crops Area harvested Taro (cocoyam) Crops Area harvested Tomatoes Crops Area harvested Tung nuts Crops Area harvested Vegetables&Melons, Total Crops Area harvested Vetches Crops Area harvested Watermelons Crops Area harvested Wheat Crops Area harvested Yams Crops Area harvested
Unit ha ha ha ha ha ha ha ha ha ha ... ha ha ha ha ha ha ha ha ha ha
Area Year
Iran 1965 6500.0 21000.0 10000.0 NaN NaN NaN NaN NaN NaN 1300000.0 ... NaN 3800.0 NaN 16200.0 NaN 265200.0 NaN 93000.0 4700000.0 NaN
Maldives 1995 NaN NaN NaN 43.0 NaN NaN NaN NaN 135.0 NaN ... 65.0 NaN 220.0 NaN NaN 251.0 NaN NaN NaN NaN
Democratic Republic of the Congo 1998 NaN NaN NaN NaN NaN NaN 4042.0 13500.0 85192.0 691.0 ... 51066.0 NaN 12314.0 5614.0 NaN 70907.0 NaN NaN 8093.0 14000.0
Honduras 2012 NaN 84.0 NaN NaN NaN NaN 286.0 NaN 22817.0 NaN ... 1200.0 NaN NaN 4731.0 NaN 29427.0 NaN 2619.0 2110.0 NaN
Guam 1967 NaN NaN NaN NaN NaN NaN NaN NaN 30.0 NaN ... 17.0 NaN NaN NaN NaN 95.0 NaN NaN NaN NaN

5 rows × 120 columns

'Crops Production'
Item Anise, badian, fennel, coriander Crops Production Apples Crops Production Apricots Crops Production Areca nuts Crops Production Artichokes Crops Production Asparagus Crops Production Avocados Crops Production Bambara beans Crops Production Bananas Crops Production Barley Crops Production ... Sweet potatoes Crops Production Tangerines, mandarins, clementines, satsumas Crops Production Taro (cocoyam) Crops Production Tomatoes Crops Production Tung nuts Crops Production Vegetables&Melons, Total Crops Production Vetches Crops Production Watermelons Crops Production Wheat Crops Production Yams Crops Production
Unit tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes ... tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes
Area Year
Peru 2010 6430.0 143861.0 232.0 NaN 127323.0 335209.0 184370.0 NaN 78000.0 216193.0 ... 263456.0 221324.0 NaN 224897.0 NaN 2834949.0 NaN 91721.0 219454.0 NaN
Colombia 2006 NaN 4518.0 NaN NaN NaN 1254.0 191710.0 NaN 1863746.0 6591.0 ... NaN 95397.0 NaN 443587.0 NaN 1579802.0 NaN 88332.0 26399.0 260685.0
Martinique 1970 NaN NaN NaN NaN NaN NaN 2000.0 NaN 160000.0 NaN ... 5500.0 NaN NaN 5000.0 NaN 24500.0 NaN NaN NaN 22000.0
Mexico 1962 120.0 95429.0 5741.0 NaN NaN 5000.0 112934.0 NaN 710375.0 151115.0 ... 102793.0 70000.0 NaN 451502.0 NaN 1509570.0 NaN 337213.0 1455256.0 NaN
Angola 2013 NaN NaN NaN NaN NaN NaN NaN NaN 3095013.0 NaN ... 1199749.0 NaN NaN 17000.0 NaN 701763.0 NaN NaN 3090.0 NaN

5 rows × 122 columns

'Crops Seed'
Item Anise, badian, fennel, coriander Crops Seed Bambara beans Crops Seed Bananas Crops Seed Barley Crops Seed Beans, dry Crops Seed Broad beans, horse beans, dry Crops Seed Buckwheat Crops Seed Cabbages and other brassicas Crops Seed Carrots and turnips Crops Seed Cassava Crops Seed ... Sorghum Crops Seed Soybeans Crops Seed Sugar cane Crops Seed Sweet potatoes Crops Seed Taro (cocoyam) Crops Seed Vegetables&Melons, Total Crops Seed Vetches Crops Seed Watermelons Crops Seed Wheat Crops Seed Yams Crops Seed
Unit tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes ... tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes
Area Year
United Kingdom 1974 NaN NaN NaN 389000.0 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN 200000.0 NaN
Morocco 2006 NaN NaN NaN 159472.0 904.0 9040.0 NaN NaN NaN NaN ... 376.0 NaN NaN NaN NaN NaN 462.0 NaN 257190.0 NaN
Iran 1980 NaN NaN NaN 169085.0 7600.0 NaN NaN NaN NaN NaN ... NaN 3303.0 NaN NaN NaN NaN NaN NaN 777342.0 NaN
Italy 1991 NaN NaN NaN 93000.0 6000.0 24000.0 NaN NaN NaN NaN ... 1056.0 23000.0 NaN NaN NaN 12000.0 302.0 NaN 675000.0 NaN
Vanuatu 1981 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 52 columns

'Crops Yield'
Item Anise, badian, fennel, coriander Crops Yield Apples Crops Yield Apricots Crops Yield Areca nuts Crops Yield Artichokes Crops Yield Asparagus Crops Yield Avocados Crops Yield Bambara beans Crops Yield Bananas Crops Yield Barley Crops Yield ... Sweet potatoes Crops Yield Tangerines, mandarins, clementines, satsumas Crops Yield Taro (cocoyam) Crops Yield Tomatoes Crops Yield Tung nuts Crops Yield Vegetables&Melons, Total Crops Yield Vetches Crops Yield Watermelons Crops Yield Wheat Crops Yield Yams Crops Yield
Unit hg/ha hg/ha hg/ha hg/ha hg/ha hg/ha hg/ha hg/ha hg/ha hg/ha ... hg/ha hg/ha hg/ha hg/ha hg/ha hg/ha hg/ha hg/ha hg/ha hg/ha
Area Year
Turkey 1988 7807.0 187500.0 76639.0 NaN 91667.0 37500.0 40000.0 NaN 261584.0 21893.0 ... NaN 143565.0 57500.0 358955.0 NaN 216094.0 7788.0 264000.0 21861.0 NaN
Cameroon 2003 NaN NaN 45439.0 NaN NaN NaN 40140.0 7401.0 93258.0 NaN ... 43491.0 NaN 55680.0 113462.0 NaN 33422.0 NaN 170124.0 13375.0 76602.0
Suriname 2003 NaN NaN NaN NaN NaN NaN NaN NaN 159750.0 NaN ... 115000.0 NaN NaN 122769.0 NaN 166106.0 NaN 186299.0 NaN NaN
Vanuatu 2006 NaN NaN NaN NaN NaN NaN NaN NaN 114800.0 NaN ... NaN NaN NaN NaN NaN 157542.0 NaN NaN NaN NaN
Panama 1971 NaN NaN NaN NaN NaN NaN NaN NaN 482238.0 NaN ... NaN NaN NaN 182576.0 NaN 124338.0 NaN 77143.0 NaN 75916.0

5 rows × 120 columns

'Livestock production'
Item Animals live nes Livestock production Asses Livestock production Beehives Livestock production Buffaloes Livestock production Camelids, other Livestock production Camels Livestock production Cattle Livestock production Cattle and Buffaloes Livestock production Chickens Livestock production Ducks Livestock production ... Horses Livestock production Mules Livestock production Pigeons, other birds Livestock production Pigs Livestock production Poultry Birds Livestock production Rabbits and hares Livestock production Rodents, other Livestock production Sheep Livestock production Sheep and Goats Livestock production Turkeys Livestock production
Unit Head Head No Livestock production Head Head Head Head Head 1000 Head 1000 Head ... Head Head 1000 Head Head 1000 Head 1000 Head 1000 Head Head Head 1000 Head
Area Year
Nigeria 2010 NaN 1200000.0 NaN NaN NaN 277727.0 16577962.0 16577962.0 192313.0 NaN ... 101509.0 NaN NaN 7471730.0 192313.0 3838.0 NaN 35519760.0 92043836.0 NaN
Netherlands Antilles (former) 1987 NaN 2600.0 NaN NaN NaN NaN 1000.0 1000.0 110.0 NaN ... NaN NaN NaN 3150.0 110.0 NaN NaN 6000.0 20000.0 NaN
Brazil 1986 NaN 1285754.0 550000.0 984811.0 NaN NaN 132221568.0 133206379.0 495640.0 5720.0 ... 5735208.0 1921404.0 NaN 32539344.0 504981.0 792.0 NaN 19659744.0 30255036.0 3621.0
Somalia 1986 NaN 24600.0 NaN NaN NaN 6378000.0 4571000.0 4571000.0 3170.0 NaN ... 890.0 23000.0 NaN 10100.0 3170.0 NaN NaN 12274000.0 31475008.0 NaN
Hungary 1989 2095500.0 4300.0 600000.0 NaN NaN NaN 1689700.0 1689700.0 56719.0 2005.0 ... 76300.0 350.0 NaN 8327000.0 61604.0 2332.0 NaN 2214800.0 2230400.0 1361.0

5 rows × 22 columns

'Live animals import quantities'
Item Animals live nes Live animals import quantities Asses Live animals import quantities Beehives Live animals import quantities Bovine, Animals Live animals import quantities Buffaloes Live animals import quantities Camelids, other Live animals import quantities Camels Live animals import quantities Cattle Live animals import quantities Chickens Live animals import quantities Ducks Live animals import quantities Goats Live animals import quantities Horses Live animals import quantities Mules Live animals import quantities Pigeons, other birds Live animals import quantities Pigs Live animals import quantities Rabbits and hares Live animals import quantities Rodents, other Live animals import quantities Sheep Live animals import quantities Sheep and Goats Live animals import quantities Turkeys Live animals import quantities
Unit Head Head No Live animals import quantities Head Head Head Head Head 1000 Head 1000 Head Head Head Head 1000 Head Head 1000 Head 1000 Head Head Head 1000 Head
Area Year
Canada 2000 NaN 281.0 NaN 352884.0 NaN NaN NaN 352884.0 26407.0 256.0 6521.0 26972.0 NaN NaN 7748.0 73.0 NaN 2802.0 9323.0 5715.0
Russia 2008 NaN NaN NaN 58352.0 NaN NaN NaN 58352.0 9518.0 NaN 306.0 485.0 NaN 0.0 770915.0 0.0 NaN 2.0 308.0 1051.0
Nepal 2008 NaN 601.0 NaN 35000.0 0.0 NaN NaN 35000.0 192.0 0.0 431758.0 27.0 NaN NaN 1830.0 NaN NaN 6241.0 437999.0 NaN
Timor-Leste 1980 NaN NaN NaN 3500.0 NaN NaN NaN 3500.0 NaN NaN NaN 0.0 NaN NaN NaN NaN NaN NaN NaN NaN
Senegal 2011 0.0 0.0 NaN 200000.0 NaN NaN NaN 200000.0 46.0 46.0 200000.0 0.0 NaN 34.0 0.0 NaN NaN 450000.0 650000.0 0.0
'Live animals export quantities'
Item Animals live nes Live animals export quantities Asses Live animals export quantities Beehives Live animals export quantities Bovine, Animals Live animals export quantities Buffaloes Live animals export quantities Camelids, other Live animals export quantities Camels Live animals export quantities Cattle Live animals export quantities Chickens Live animals export quantities Ducks Live animals export quantities Goats Live animals export quantities Horses Live animals export quantities Mules Live animals export quantities Pigeons, other birds Live animals export quantities Pigs Live animals export quantities Rabbits and hares Live animals export quantities Rodents, other Live animals export quantities Sheep Live animals export quantities Sheep and Goats Live animals export quantities Turkeys Live animals export quantities
Unit Head Head No Live animals export quantities Head Head Head Head Head 1000 Head 1000 Head Head Head Head 1000 Head Head 1000 Head 1000 Head Head Head 1000 Head
Area Year
Mauritius 2009 0.0 NaN NaN NaN NaN NaN NaN NaN 855.0 5.0 NaN 67.0 NaN NaN NaN NaN NaN NaN NaN NaN
Trinidad and Tobago 2002 0.0 NaN NaN NaN NaN NaN NaN NaN 30.0 NaN 5.0 26.0 NaN NaN 0.0 NaN NaN NaN 5.0 0.0
Panama 2001 NaN NaN NaN 22023.0 NaN NaN NaN 22023.0 480.0 NaN NaN 39.0 NaN NaN NaN NaN NaN 0.0 0.0 NaN
Burundi 2001 NaN NaN NaN 0.0 NaN NaN NaN 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.0 0.0 NaN
Italy 1986 NaN 52.0 NaN 3499.0 0.0 NaN NaN 3499.0 1386.0 55.0 0.0 114.0 3.0 0.0 2.0 16.0 NaN 797.0 797.0 370.0
'Food import quantities'
Item Alfalfa meal and pellets Food import quantities Almonds shelled Food import quantities Animal Oil+Fat+Grs Food import quantities Animal Vegetable Oil Food import quantities Animal fats Food import quantities Anise, badian, fennel, coriander Food import quantities Apples Food import quantities Apricots Food import quantities Apricots, dry Food import quantities Artichokes Food import quantities ... Wheat+Flour,Wheat Equivalent Food import quantities Whey, Pres+Concen Food import quantities Whey, condensed Food import quantities Whey, dry Food import quantities Wine Food import quantities Wine+Vermouth+Sim. Food import quantities Wool, degreased Food import quantities Wool, greasy Food import quantities Wool, hair waste Food import quantities Yoghurt, concentrated or not Food import quantities
Unit tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes ... tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes
Area Year
Saint Kitts and Nevis 1995 NaN NaN 0.0 618.0 0.0 NaN 128.0 NaN NaN NaN ... 3246.0 NaN NaN NaN 176.0 176.0 NaN NaN NaN NaN
Romania 1987 NaN 0.0 100.0 2900.0 100.0 NaN 0.0 NaN NaN NaN ... 0.0 500.0 NaN 500.0 0.0 0.0 0.0 1600.0 NaN NaN
Nepal 2007 NaN 126.0 504.0 182768.0 504.0 5353.0 26999.0 12.0 17.0 56.0 ... 1644.0 9.0 NaN 9.0 542.0 548.0 4343.0 605.0 5.0 0.0
Comoros 1977 NaN NaN NaN 0.0 NaN 0.0 NaN NaN NaN NaN ... 796.0 NaN NaN NaN 100.0 100.0 NaN NaN NaN NaN
Togo 1990 NaN NaN 4.0 12039.0 4.0 65.0 132.0 NaN NaN NaN ... 89557.0 252.0 NaN 252.0 1977.0 2159.0 NaN NaN NaN NaN

5 rows × 452 columns

'Food export quantities'
Item Alfalfa meal and pellets Food export quantities Almonds shelled Food export quantities Animal Oil+Fat+Grs Food export quantities Animal Vegetable Oil Food export quantities Animal fats Food export quantities Anise, badian, fennel, coriander Food export quantities Apples Food export quantities Apricots Food export quantities Apricots, dry Food export quantities Artichokes Food export quantities ... Wheat+Flour,Wheat Equivalent Food export quantities Whey, Pres+Concen Food export quantities Whey, condensed Food export quantities Whey, dry Food export quantities Wine Food export quantities Wine+Vermouth+Sim. Food export quantities Wool, degreased Food export quantities Wool, greasy Food export quantities Wool, hair waste Food export quantities Yoghurt, concentrated or not Food export quantities
Unit tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes ... tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes tonnes
Area Year
British Virgin Islands 1988 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Ghana 1969 NaN NaN NaN 58.0 NaN NaN NaN NaN NaN NaN ... 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN
Costa Rica 1972 NaN NaN 110.0 245.0 110.0 0.0 0.0 NaN NaN NaN ... 0.0 NaN NaN NaN 0.0 0.0 NaN NaN NaN NaN
Tonga 2005 NaN NaN NaN 0.0 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Nigeria 2009 NaN NaN NaN 17701.0 NaN NaN NaN NaN NaN NaN ... 102.0 0.0 NaN 0.0 3.0 3.0 NaN NaN NaN NaN

5 rows × 444 columns

Some Nan values have appeared. After some analysis, we have conclude to replace those Nan values by zeros. Indeed, it seems that those Nan values means that the value were very low and not significant to be measured.

In [50]:
# Deal with the NaN that appeared
for df_name in df_useful :
    if df_name != "GDP":
        for column in list(df_useful[df_name]):
            if column not in ['Area', 'Year']:
                df_useful[df_name][column].fillna(0, inplace=True)
In [51]:
#removing the multiindex, so that merge is clean with GDP and CPI
for df_name in need_pivot :
    df_useful[df_name].columns = [' '.join([str(_) for _ in v]) for v in df_useful[df_name].columns.values]
    display(df_useful[df_name].sample(5))
    
Anise, badian, fennel, coriander Crops Area harvested ha Apples Crops Area harvested ha Apricots Crops Area harvested ha Areca nuts Crops Area harvested ha Artichokes Crops Area harvested ha Asparagus Crops Area harvested ha Avocados Crops Area harvested ha Bambara beans Crops Area harvested ha Bananas Crops Area harvested ha Barley Crops Area harvested ha ... Sweet potatoes Crops Area harvested ha Tangerines, mandarins, clementines, satsumas Crops Area harvested ha Taro (cocoyam) Crops Area harvested ha Tomatoes Crops Area harvested ha Tung nuts Crops Area harvested ha Vegetables&Melons, Total Crops Area harvested ha Vetches Crops Area harvested ha Watermelons Crops Area harvested ha Wheat Crops Area harvested ha Yams Crops Area harvested ha
Area Year
Egypt 2010 28308.0 21088.0 6241.0 0.0 8909.0 0.0 0.0 0.0 22665.0 87752.0 ... 12118.0 39826.0 4098.0 216385.0 0.0 776758.0 750.0 54051.0 1287627.0 0.0
Timor-Leste 2003 0.0 0.0 0.0 0.0 0.0 0.0 110.0 0.0 60.0 0.0 ... 6843.0 0.0 0.0 95.0 0.0 8809.0 0.0 0.0 0.0 0.0
Vietnam 1986 1600.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 80000.0 0.0 ... 329000.0 0.0 0.0 0.0 0.0 308200.0 0.0 15000.0 0.0 0.0
Russia 2007 4790.0 364000.0 18000.0 0.0 0.0 0.0 0.0 0.0 0.0 8369600.0 ... 0.0 0.0 0.0 104200.0 0.0 721084.0 204500.0 107344.0 23500500.0 0.0
Rwanda 1990 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 400570.0 0.0 ... 175893.0 0.0 52137.0 0.0 0.0 21484.0 0.0 0.0 9313.0 500.0

5 rows × 120 columns

Anise, badian, fennel, coriander Crops Production tonnes Apples Crops Production tonnes Apricots Crops Production tonnes Areca nuts Crops Production tonnes Artichokes Crops Production tonnes Asparagus Crops Production tonnes Avocados Crops Production tonnes Bambara beans Crops Production tonnes Bananas Crops Production tonnes Barley Crops Production tonnes ... Sweet potatoes Crops Production tonnes Tangerines, mandarins, clementines, satsumas Crops Production tonnes Taro (cocoyam) Crops Production tonnes Tomatoes Crops Production tonnes Tung nuts Crops Production tonnes Vegetables&Melons, Total Crops Production tonnes Vetches Crops Production tonnes Watermelons Crops Production tonnes Wheat Crops Production tonnes Yams Crops Production tonnes
Area Year
China 1993 21000.0 9073621.0 28861.0 137467.0 10000.0 2103975.5 20000.0 0.0 2807054.0 4150000.0 ... 113243940.0 4468936.0 1277819.0 10371861.0 421027.0 175733372.5 0.0 17175289.5 106392460.5 0.0
Mauritius 2007 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 9026.0 0.0 ... 800.0 0.0 372.0 11117.0 0.0 65491.0 0.0 0.0 0.0 0.0
Vietnam 2012 4100.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1791937.0 0.0 ... 1427242.0 0.0 0.0 0.0 0.0 13773705.0 0.0 1115649.0 0.0 0.0
Australia 2010 3322.0 264401.0 13764.0 0.0 0.0 8991.0 41023.0 0.0 302173.0 7294000.0 ... 51210.0 91002.0 0.0 471883.0 0.0 1960601.0 14000.0 140793.0 22138000.0 0.0
Russia 2011 11894.0 1198000.0 60000.0 0.0 0.0 0.0 0.0 0.0 0.0 16938000.0 ... 0.0 0.0 0.0 2395399.0 0.0 15906955.0 143434.0 1574560.0 56239992.0 0.0

5 rows × 122 columns

Anise, badian, fennel, coriander Crops Seed tonnes Bambara beans Crops Seed tonnes Bananas Crops Seed tonnes Barley Crops Seed tonnes Beans, dry Crops Seed tonnes Broad beans, horse beans, dry Crops Seed tonnes Buckwheat Crops Seed tonnes Cabbages and other brassicas Crops Seed tonnes Carrots and turnips Crops Seed tonnes Cassava Crops Seed tonnes ... Sorghum Crops Seed tonnes Soybeans Crops Seed tonnes Sugar cane Crops Seed tonnes Sweet potatoes Crops Seed tonnes Taro (cocoyam) Crops Seed tonnes Vegetables&Melons, Total Crops Seed tonnes Vetches Crops Seed tonnes Watermelons Crops Seed tonnes Wheat Crops Seed tonnes Yams Crops Seed tonnes
Area Year
Germany 1998 0.0 0.0 0.0 327000.0 0.0 4771.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 473000.0 0.0
Bulgaria 2004 0.0 0.0 0.0 48383.0 557.0 0.0 0.0 0.0 0.0 0.0 ... 50.0 16.0 0.0 0.0 0.0 0.0 39.0 0.0 204183.0 0.0
China 1965 0.0 0.0 0.0 280455.0 200892.5 325500.0 294000.0 0.0 0.0 0.0 ... 227550.0 885085.0 0.0 117000.0 0.0 0.0 0.0 0.0 3335435.0 0.0
Republic of the Congo 1985 0.0 0.0 0.0 0.0 128.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Gambia 1965 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 210.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

5 rows × 52 columns

Anise, badian, fennel, coriander Crops Yield hg/ha Apples Crops Yield hg/ha Apricots Crops Yield hg/ha Areca nuts Crops Yield hg/ha Artichokes Crops Yield hg/ha Asparagus Crops Yield hg/ha Avocados Crops Yield hg/ha Bambara beans Crops Yield hg/ha Bananas Crops Yield hg/ha Barley Crops Yield hg/ha ... Sweet potatoes Crops Yield hg/ha Tangerines, mandarins, clementines, satsumas Crops Yield hg/ha Taro (cocoyam) Crops Yield hg/ha Tomatoes Crops Yield hg/ha Tung nuts Crops Yield hg/ha Vegetables&Melons, Total Crops Yield hg/ha Vetches Crops Yield hg/ha Watermelons Crops Yield hg/ha Wheat Crops Yield hg/ha Yams Crops Yield hg/ha
Area Year
Bolivia 1961 0.0 59259.0 0.0 0.0 0.0 0.0 50000.0 0.0 238095.0 6582.0 ... 102000.0 73684.0 0.0 120000.0 0.0 31580.0 0.0 0.0 5625.0 0.0
Guyana 2005 0.0 0.0 0.0 0.0 0.0 0.0 39985.0 0.0 76120.0 0.0 ... 19669.0 0.0 90436.0 53067.0 0.0 63661.0 0.0 92857.0 0.0 93750.0
United States of America 1997 0.0 247215.0 145612.0 0.0 112062.0 30674.0 60333.0 0.0 161719.0 31240.0 ... 181969.0 275288.0 138333.0 653069.0 0.0 264452.0 0.0 249587.0 26574.0 0.0
South Korea 1991 0.0 107128.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 37150.0 ... 217293.0 275144.0 0.0 361351.0 0.0 256523.0 0.0 239472.0 30955.0 0.0
Yemen 2009 0.0 87051.0 21679.0 0.0 0.0 0.0 0.0 0.0 129012.0 7189.0 ... 90402.0 149502.0 0.0 139045.0 0.0 119324.0 0.0 128815.0 18901.0 0.0

5 rows × 120 columns

Animals live nes Livestock production Head Asses Livestock production Head Beehives Livestock production No Livestock production Buffaloes Livestock production Head Camelids, other Livestock production Head Camels Livestock production Head Cattle Livestock production Head Cattle and Buffaloes Livestock production Head Chickens Livestock production 1000 Head Ducks Livestock production 1000 Head ... Horses Livestock production Head Mules Livestock production Head Pigeons, other birds Livestock production 1000 Head Pigs Livestock production Head Poultry Birds Livestock production 1000 Head Rabbits and hares Livestock production 1000 Head Rodents, other Livestock production 1000 Head Sheep Livestock production Head Sheep and Goats Livestock production Head Turkeys Livestock production 1000 Head
Area Year
Egypt 2003 0.0 3150000.0 1437000.0 3777000.0 0.0 136000.0 4227000.0 8004000.0 95000.0 9600.0 ... 62000.0 1150.0 13200.0 30500.0 125973.0 7300.0 0.0 4939000.0 8750000.0 1360.0
Comoros 2001 0.0 5000.0 0.0 0.0 0.0 0.0 53000.0 53000.0 490.0 0.0 ... 0.0 0.0 0.0 0.0 490.0 0.0 0.0 21000.0 134150.0 0.0
Ghana 1965 0.0 8891.0 0.0 0.0 0.0 0.0 750000.0 750000.0 7935.0 0.0 ... 3706.0 0.0 0.0 151000.0 7935.0 0.0 0.0 1130000.0 2280000.0 0.0
Mali 1985 0.0 436000.0 32000.0 0.0 0.0 230000.0 4344000.0 4344000.0 20000.0 0.0 ... 54600.0 0.0 0.0 55200.0 20000.0 0.0 0.0 5000000.0 9847000.0 0.0
Bolivia 2011 0.0 635000.0 0.0 0.0 3077887.0 0.0 8400439.0 8400439.0 195027.0 560.0 ... 483000.0 82000.0 0.0 2712800.0 195742.0 260.0 425.0 8877510.0 11132223.0 155.0

5 rows × 22 columns

Animals live nes Live animals import quantities Head Asses Live animals import quantities Head Beehives Live animals import quantities No Live animals import quantities Bovine, Animals Live animals import quantities Head Buffaloes Live animals import quantities Head Camelids, other Live animals import quantities Head Camels Live animals import quantities Head Cattle Live animals import quantities Head Chickens Live animals import quantities 1000 Head Ducks Live animals import quantities 1000 Head Goats Live animals import quantities Head Horses Live animals import quantities Head Mules Live animals import quantities Head Pigeons, other birds Live animals import quantities 1000 Head Pigs Live animals import quantities Head Rabbits and hares Live animals import quantities 1000 Head Rodents, other Live animals import quantities 1000 Head Sheep Live animals import quantities Head Sheep and Goats Live animals import quantities Head Turkeys Live animals import quantities 1000 Head
Area Year
Mozambique 1967 0.0 0.0 0.0 639.0 0.0 0.0 0.0 639.0 1058.0 0.0 0.0 51.0 0.0 0.0 40.0 0.0 0.0 7.0 7.0 0.0
Oman 2007 0.0 0.0 0.0 65.0 0.0 0.0 0.0 65.0 33.0 73.0 1073003.0 137.0 0.0 0.0 0.0 0.0 0.0 430260.0 1503263.0 0.0
Kuwait 1961 0.0 0.0 0.0 9000.0 0.0 0.0 0.0 9000.0 900.0 0.0 70000.0 0.0 0.0 0.0 0.0 0.0 0.0 250000.0 320000.0 0.0
Burkina Faso 1976 0.0 0.0 0.0 893.0 0.0 0.0 0.0 893.0 2.0 0.0 0.0 6.0 0.0 0.0 0.0 0.0 0.0 3613.0 3613.0 0.0
Iceland 1988 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Animals live nes Live animals export quantities Head Asses Live animals export quantities Head Beehives Live animals export quantities No Live animals export quantities Bovine, Animals Live animals export quantities Head Buffaloes Live animals export quantities Head Camelids, other Live animals export quantities Head Camels Live animals export quantities Head Cattle Live animals export quantities Head Chickens Live animals export quantities 1000 Head Ducks Live animals export quantities 1000 Head Goats Live animals export quantities Head Horses Live animals export quantities Head Mules Live animals export quantities Head Pigeons, other birds Live animals export quantities 1000 Head Pigs Live animals export quantities Head Rabbits and hares Live animals export quantities 1000 Head Rodents, other Live animals export quantities 1000 Head Sheep Live animals export quantities Head Sheep and Goats Live animals export quantities Head Turkeys Live animals export quantities 1000 Head
Area Year
Georgia 1998 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 110.0 0.0 0.0 0.0 0.0 0.0
Singapore 2004 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3.0 1.0 19082.0 177.0 0.0 0.0 0.0 0.0 0.0 1921.0 21003.0 0.0
Saint Vincent and the Grenadines 2011 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Benin 1984 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Costa Rica 1970 0.0 0.0 0.0 329.0 0.0 0.0 0.0 329.0 100.0 0.0 0.0 12.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Alfalfa meal and pellets Food import quantities tonnes Almonds shelled Food import quantities tonnes Animal Oil+Fat+Grs Food import quantities tonnes Animal Vegetable Oil Food import quantities tonnes Animal fats Food import quantities tonnes Anise, badian, fennel, coriander Food import quantities tonnes Apples Food import quantities tonnes Apricots Food import quantities tonnes Apricots, dry Food import quantities tonnes Artichokes Food import quantities tonnes ... Wheat+Flour,Wheat Equivalent Food import quantities tonnes Whey, Pres+Concen Food import quantities tonnes Whey, condensed Food import quantities tonnes Whey, dry Food import quantities tonnes Wine Food import quantities tonnes Wine+Vermouth+Sim. Food import quantities tonnes Wool, degreased Food import quantities tonnes Wool, greasy Food import quantities tonnes Wool, hair waste Food import quantities tonnes Yoghurt, concentrated or not Food import quantities tonnes
Area Year
Zimbabwe 2000 0.0 5.0 10256.0 62074.0 10256.0 25.0 6227.0 1.0 2.0 10.0 ... 57922.0 194.0 0.0 194.0 469.0 469.0 20.0 0.0 0.0 64.0
Faroe Islands 1998 7.0 5.0 8.0 201.0 8.0 0.0 481.0 1.0 2.0 0.0 ... 2469.0 0.0 0.0 0.0 210.0 218.0 10.0 0.0 0.0 31.0
France 1971 4995.0 13126.0 62073.0 628226.0 62073.0 1136.0 52061.0 8246.0 1037.0 45542.0 ... 232734.0 4943.0 0.0 4943.0 516766.0 555532.0 10821.0 160227.0 5807.0 0.0
Macedonia 2012 0.0 367.0 1074.0 60455.0 1074.0 60.0 671.0 27.0 36.0 0.0 ... 181056.0 1083.0 0.0 1083.0 291.0 318.0 18.0 93.0 0.0 2113.0
Iraq 1974 0.0 0.0 4946.0 83160.0 4946.0 250.0 13851.0 0.0 0.0 0.0 ... 671817.0 0.0 0.0 0.0 11.0 80.0 659.0 80.0 0.0 0.0

5 rows × 452 columns

Alfalfa meal and pellets Food export quantities tonnes Almonds shelled Food export quantities tonnes Animal Oil+Fat+Grs Food export quantities tonnes Animal Vegetable Oil Food export quantities tonnes Animal fats Food export quantities tonnes Anise, badian, fennel, coriander Food export quantities tonnes Apples Food export quantities tonnes Apricots Food export quantities tonnes Apricots, dry Food export quantities tonnes Artichokes Food export quantities tonnes ... Wheat+Flour,Wheat Equivalent Food export quantities tonnes Whey, Pres+Concen Food export quantities tonnes Whey, condensed Food export quantities tonnes Whey, dry Food export quantities tonnes Wine Food export quantities tonnes Wine+Vermouth+Sim. Food export quantities tonnes Wool, degreased Food export quantities tonnes Wool, greasy Food export quantities tonnes Wool, hair waste Food export quantities tonnes Yoghurt, concentrated or not Food export quantities tonnes
Area Year
Syria 2005 0.0 770.0 0.0 46805.0 0.0 41657.0 38494.0 2893.0 145.0 11.0 ... 463212.0 0.0 0.0 0.0 0.0 0.0 2439.0 2998.0 0.0 0.0
Ecuador 1989 0.0 0.0 0.0 1892.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 61.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0
United Republic of Tanzania 1990 0.0 0.0 0.0 389.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Brunei Darussalam 1993 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Saint Lucia 1975 0.0 0.0 0.0 2778.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

5 rows × 444 columns

1.F.b. Merging everything

We are now creating the uniformized dataframe uni_df. Each row corresponds to one tuple (Country, Year) so that we can later group by country or year. In addition to the country ("Area") and the "Year", the columns will either be an economic feature ("GDP", "CPI") or an agricultural one (some crop harvested area, some livestock export quantity, ...). With this uniformized dataframe, we can later on analyze correlations and links between different feautures place and yearwise. This means we can measure a correlation of e.g. GDP with the production of a specific crop over all countries and all years.

In [52]:
uni_df = df_useful['GDP'].dropna()
for df_name in need_pivot :
    uni_df = pd.merge(uni_df, df_useful[df_name], how='left', on=['Area', 'Year'])
uni_df = pd.merge(uni_df,df_useful['Consumer price indices'], how='left', on=['Area', 'Year'])

# Deal with the NaN that appeared
for column in list(uni_df):
    if column not in ['Area', 'Year']:
        uni_df[column].fillna(0, inplace=True)
uni_df.sample(30)
Out[52]:
Area Year (GDP, million $) Anise, badian, fennel, coriander Crops Area harvested ha Apples Crops Area harvested ha Apricots Crops Area harvested ha Areca nuts Crops Area harvested ha Artichokes Crops Area harvested ha Asparagus Crops Area harvested ha Avocados Crops Area harvested ha ... Whey, Pres+Concen Food export quantities tonnes Whey, condensed Food export quantities tonnes Whey, dry Food export quantities tonnes Wine Food export quantities tonnes Wine+Vermouth+Sim. Food export quantities tonnes Wool, degreased Food export quantities tonnes Wool, greasy Food export quantities tonnes Wool, hair waste Food export quantities tonnes Yoghurt, concentrated or not Food export quantities tonnes (Consumer price indices, %)
2384 North Korea 2000 10607.906977 0.0 70505.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000
4070 Israel 1998 115952.767670 0.0 4300.0 735.0 0.0 616.0 50.0 5940.0 ... 0.0 0.0 0.0 2700.0 2820.0 0.0 0.0 80.0 0.0 0.000000
5147 Marshall Islands 1988 71.333786 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000
2057 Cook Islands 1986 36.364019 0.0 0.0 0.0 0.0 0.0 0.0 80.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000
2780 Equatorial Guinea 1982 62.160525 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000
4025 Ireland 1999 98778.690314 0.0 700.0 0.0 0.0 0.0 0.0 0.0 ... 39005.0 4437.0 34568.0 440.0 640.0 387.0 5743.0 7.0 1752.0 0.000000
5805 Netherlands 1976 108407.171300 2034.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 52841.0 0.0 52841.0 1149.0 1945.0 3397.0 2236.0 1012.0 0.0 0.000000
4171 Jamaica 2007 12799.597772 0.0 0.0 0.0 0.0 0.0 0.0 423.0 ... 0.0 0.0 0.0 477.0 489.0 0.0 0.0 0.0 0.0 87.667192
7946 Sweden 1993 209950.073400 0.0 2000.0 0.0 0.0 0.0 0.0 0.0 ... 589.0 0.0 589.0 39.0 97.0 8.0 50.0 0.0 67.0 0.000000
6080 Nigeria 1978 126449.750530 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000
7128 Samoa 2008 656.818068 0.0 0.0 0.0 0.0 0.0 0.0 40.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 93.326148
3312 Greece 1976 31162.067822 900.0 19600.0 6600.0 0.0 3900.0 400.0 0.0 ... 0.0 0.0 0.0 75022.0 76264.0 571.0 186.0 0.0 0.0 0.000000
1875 China 1993 623053.793920 25000.0 2250790.5 6565.5 32632.0 3558.0 550762.5 2000.0 ... 2.0 0.0 3130.5 4658.5 5697.0 12884.5 9858.5 457.0 582.5 0.000000
4272 Kazakhstan 1990 29715.748139 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000
5521 Montserrat 2014 58.985185 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 104.256287
7419 Sierra Leone 1997 1151.076330 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000
4858 Madagascar 1975 1844.039747 0.0 3565.0 35.0 0.0 0.0 0.0 1870.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000
3405 Grenada 1977 57.173517 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000
4636 Lesotho 2009 1711.475803 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 400.0 0.0 0.0 97.092381
5137 Marshall Islands 1978 22.412141 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000
6815 Republic of Moldova 1997 1928.585332 0.0 91804.0 3169.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 197906.0 293640.0 168.0 1709.0 17.0 0.0 0.000000
6832 Republic of Moldova 2014 7983.223921 0.0 44459.0 2446.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 112.126622
6084 Nigeria 1982 180427.402420 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000
4359 Kiribati 1985 29.975261 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000
8588 Ukraine 1999 32705.564369 2080.0 245500.0 12000.0 0.0 0.0 0.0 0.0 ... 134.0 134.0 0.0 15744.0 15995.0 0.0 986.0 0.0 0.0 0.000000
1709 Chad 2011 11167.946516 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 96.295704
2184 Croatia 1995 22387.757742 0.0 5125.0 315.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 19294.0 24907.0 47.0 335.0 38.0 563.0 0.000000
6878 Romania 2014 199325.923550 3169.0 56130.0 2984.0 0.0 55.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 177.102070
596 Bahamas 2008 8246.650156 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 5.0 96.679554
7132 Samoa 2012 800.055756 0.0 0.0 0.0 0.0 0.0 0.0 42.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 102.391117

30 rows × 1376 columns

2.A.a Crops and livestock production and trade

For the next milestone, we will also produce some maps, showing for instance the production of a specific crop per country over the years. We have shown in previous parts that the dataset contains the necessary data and that we can handle the data in its size and plot maps.

2.A.b Introducing the concept of food self-sufficiency

In this section we will present and compute the notion of food self-sufficiency. We will use the quantitative definition of the Food and Agriculture Organization (FAO).

2.A.b.i Basic idea

One may wonder how to know whether a country produces all the food it needs or not. The notion of food-self-sufficency allows to answer to this question. More formally, it is a rate that decribes how to which degree a country can satisfy to meet its internal consumption needs by production. It describes the extent to which a country is able to feed its population through its domestic food production. We are interested in this measure since we thinkit could be correlated with the economic conditions of this country, particularly price stability. Price stability is defined in the next part.

2.A.b.ii Formula and computation

In order to compute the food self-sufficiency, we will apply the following formula that gives us the food self-sudfficiency as a percentage :

$$\frac{Production \times 100}{Production + Imports – Exports}$$

The following is a trial calculation of self-sufficiency. Refining which agriculutral products should go into this calculation still need to be done for next milestone. Indeed with our first calculations it seems that the self-sufficiency is always lower than 100% whereas this should not be the case.

In [53]:
#We calculate food self sufficiency for the most commonly produced and traded crop (by mass) which is "cereals". 

production_columns = list(uni_df.filter(like="Cereals (Rice Milled Eqv) Crops Production tonnes"))
import_columns = list(uni_df.filter(like="Cereals Food export quantities tonnes"))
export_columns = list(uni_df.filter(like="Cereals Food import quantities tonnes"))
            
uni_df[('All productions','tonnes')] = 0
for column in production_columns :
    uni_df[('All productions','tonnes')] += uni_df[column]            
uni_df[('All imports','tonnes')] = 0
for column in import_columns :
    uni_df[('All imports','tonnes')] += uni_df[column]            
uni_df[('All exports','tonnes')] = 0
for column in export_columns :
    uni_df[('All exports','tonnes')] += uni_df[column]
            
uni_df[('food self-sufficiency','%')] = 100 * uni_df[('All productions','tonnes')] / (uni_df[('All productions','tonnes')]+uni_df[('All imports','tonnes')]+uni_df[('All exports','tonnes')])
In [54]:
display(uni_df[['Area','Year',('food self-sufficiency','%')]].sample(5))
Area Year (food self-sufficiency, %)
4620 Lesotho 1993 36.058295
8003 Switzerland 2004 66.386409
6170 Occupied Palestinian Territory 1976 0.000000
1199 British Virgin Islands 2007 0.000000
3660 Haiti 2002 37.366354
In [55]:
plot = uni_df[['Area','Year']]
plot["Value"] = uni_df[('food self-sufficiency','%')]
for year in range(1980, 2010, 5):
    display(year, visualise_world_data_folium(plot, year, False))
    
1980
1985
1990
1995
2000
2005

2.B. Consumer price indices

      1. Definition

      2. Usage

2.B.a. Definition

Consumer price indices (CPI) are a way to measure the changes of the average price level of goods. Typically a "basket of consumer goods and services" is used to calculate average consumer prices each year. Then, the relative change of these prices is used as a measure of inflation or deflation over a period of time. More technically, for a given item, the CPI is the ratio of the market basket for two different years. Global CPI is an average of sigle item CPI with some standardized weights. The FAO dataset includes the consumer prices, food indices. This means we have information about countries food price stability over the years.

2.B.b. Usage

The CPI has many uses and is often taken into consideration. For instance it is used for budget and pension revisions, monetary and economic policies, and economic analysis. It is a good indicator of relative price stability, which is essential for development and economic safety. The european central banks main objective is price stability in the euro-zone of keeping the consumer price index below a growth of 2% per year.

We will use the CPI to answer the following questions: "Are prices more stables in more self-sufficient countries ?", "Is there a link between the CPI and other agricultural features ?"

2.C. Structure of international trade and historical context

Our dataset contains data for the historical period from 1970 to 2015. In order to be able to correctly interpret the results we are going to see, we first made a brief historical research on this period. Listed below are important events of this period for which we think they have had a significant influence on the agriculture and the economy.

There was the Cold war from 1945 to 1990 with two economic superpowers (USA and USSR). The USSR had been dissolved in 1991. The Japanese economic miracle occured from 1945 to 1990 and allowed Japan to come out of the disastrous state in which it was at the exit of the WW2 and become one of the worlds largest economies. There have been 2 big oil crises, in 1973 and 1979. There have been many wars (Middle East wars 1973-2000 e.g. Yom Kippur War 1973, Islamic Revolution in Iran 1979, Iran–Iraq war 1980-1988, Gulf war 1990-1991, Yugoslav wars 1991-2001...). We have already seen some consequences of such events by dealing with countries names in a previous section.

The third Agricultural Revolution (also known as Green revolution) occurs form 1960 to 1990 and improved agricultural productions thanks to fertilizers and chemicals.

The following public-domain image from Wikimedia represents developed countries (blue), developing ones (orange) and least developed ones (red) according to the United Nations and International Monetary Fund. We expect to see similar results with our dataset (GDP).

The following image, also from Wikimedia shows the cumulative commercial balance for the period 1980-2008. We also expect to see similar results with our dataset, but there might be difference as we focus on agriculture.

In order to have an idea of the international trade and economy structure, we are interested in GDP:

In [56]:
pivoted_GDP_df = uni_df[['Area','Year']]
pivoted_GDP_df["GDP"] = uni_df["(GDP, million $)"]
pivoted_GDP_df = pivoted_GDP_df.pivot_table(index='Year', columns='Area', values="GDP").dropna(axis=1)
In [57]:
pivoted_GDP_df.sample(5)
Out[57]:
Area Afghanistan Albania Algeria Andorra Angola Anguilla Antigua and Barbuda Argentina Aruba Australia ... United Arab Emirates United Kingdom United Republic of Tanzania United States of America Uruguay Vanuatu Venezuela Vietnam Zambia Zimbabwe
Year
2014 21123.486862 13277.963577 213983.063000 3350.599726 145712.177590 311.344444 1274.330316 570722.90615 2649.720670 1.451276e+06 ... 401957.987830 2.998834e+06 49488.305205 17393103.0 57235.942016 814.933983 363266.237610 186204.652920 27134.904865 14197.000000
2002 4285.170218 4355.865889 56819.210891 1733.195719 15285.592487 157.444444 809.754485 110583.59032 1941.094724 4.350409e+05 ... 109816.091080 1.757510e+06 14140.007894 10977514.0 13606.503907 264.219504 92889.443990 35064.105501 4193.839169 7111.380000
2003 4935.549827 5561.459461 67863.851626 2398.553103 17812.704626 169.777778 850.218605 140444.13968 2021.305585 5.586089e+05 ... 124346.476180 2.028579e+06 15244.794315 11510670.0 12045.652609 314.455046 83529.234838 39552.513231 4901.874364 6705.440000
2006 7514.261673 8896.076004 117027.277940 3543.075303 52384.484190 284.262963 1149.025498 235423.73814 2424.581006 8.181896e+05 ... 222116.532960 2.678344e+06 18823.844159 13855888.0 19579.486729 439.376662 183477.410340 66371.664817 12756.947254 6101.630000
1998 2911.960069 2554.868838 48079.007974 1533.515782 8617.787964 138.637505 727.860578 324135.10474 1665.363128 3.896419e+05 ... 75674.384205 1.623501e+06 12362.402801 9089168.0 25385.887793 259.835075 91336.823522 27209.602050 3537.731099 8401.386477

5 rows × 179 columns

As we can see on a subset of the correlation matrix below, GDP are often hugely correlated between countries.

In [58]:
selected_countries = ['Algeria', 'Australia', 'Austria', 'Bangladesh', 'China',
                      'Djibouti', 'France', 'Germany', 'India', 'Japan', 'Mali',
                      'Switzerland', 'United States of America']

corr = pivoted_GDP_df[selected_countries].corr()
corr.style.background_gradient(cmap='coolwarm')
Out[58]:
Area Algeria Australia Austria Bangladesh China Djibouti France Germany India Japan Mali Switzerland United States of America
Area
Algeria 1 0.976574 0.907284 0.933577 0.929756 0.944831 0.900182 0.869324 0.968863 0.709052 0.97585 0.940471 0.894591
Australia 0.976574 1 0.942796 0.951553 0.940319 0.949289 0.932475 0.911348 0.983419 0.772262 0.988519 0.971698 0.929846
Austria 0.907284 0.942796 1 0.893578 0.826997 0.918576 0.998487 0.994014 0.904309 0.904562 0.936305 0.983805 0.977718
Bangladesh 0.933577 0.951553 0.893578 1 0.982876 0.988926 0.875213 0.865189 0.984286 0.70948 0.976807 0.946768 0.913102
China 0.929756 0.940319 0.826997 0.982876 1 0.955692 0.80428 0.78567 0.980841 0.601672 0.963651 0.898221 0.842533
Djibouti 0.944831 0.949289 0.918576 0.988926 0.955692 1 0.905293 0.898016 0.971203 0.764799 0.972581 0.963414 0.937559
France 0.900182 0.932475 0.998487 0.875213 0.80428 0.905293 1 0.993633 0.888926 0.907598 0.924552 0.976428 0.973908
Germany 0.869324 0.911348 0.994014 0.865189 0.78567 0.898016 0.993633 1 0.86628 0.936339 0.904118 0.973805 0.967602
India 0.968863 0.983419 0.904309 0.984286 0.980841 0.971203 0.888926 0.86628 1 0.700805 0.992389 0.951396 0.909
Japan 0.709052 0.772262 0.904562 0.70948 0.601672 0.764799 0.907598 0.936339 0.700805 1 0.744362 0.875918 0.888416
Mali 0.97585 0.988519 0.936305 0.976807 0.963651 0.972581 0.924552 0.904118 0.992389 0.744362 1 0.971381 0.927113
Switzerland 0.940471 0.971698 0.983805 0.946768 0.898221 0.963414 0.976428 0.973805 0.951396 0.875918 0.971381 1 0.970197
United States of America 0.894591 0.929846 0.977718 0.913102 0.842533 0.937559 0.973908 0.967602 0.909 0.888416 0.927113 0.970197 1

The correlation matrix contains lots of values that are very closed to one (red). This is also true for the whole correlation matrix as seen below:

In [59]:
f = plt.figure(figsize=(19, 15))
plt.matshow(pivoted_GDP_df.corr(), fignum=f.number)
cb = plt.colorbar()
cb.ax.tick_params()
plt.title('Correlation Matrix', fontsize=16);

We then try to clusterize this correlation matrix in order to find countries whose GDP are correlated:

In [60]:
corr = pivoted_GDP_df.corr().values
pdist = spc.distance.pdist(corr)   # vector of ('55' choose 2) pairwise distances
linkage = spc.linkage(pdist, method='complete')
ind = spc.fcluster(linkage, 0.32*pdist.max(), 'distance')
columns = [pivoted_GDP_df.columns.tolist()[i] for i in list((np.argsort(ind)))]
clusterised_df = pivoted_GDP_df.reindex(columns, axis=1)

f = plt.figure(figsize=(19, 15))
plt.matshow(clusterised_df.corr(), fignum=f.number)
cb = plt.colorbar()
cb.ax.tick_params()
plt.title('Correlation Matrix', fontsize=16);

We have found regions in which the GDP is highly correlated and between which the correlation coefficent is lower. We could refine the big clusters by iterating this method.

Interpretation: The correlation matrix of GDP contains lots of values that are very closed to one. This means that GDP in two different countries have a trend to evolve the same way. Therefore, we can say that the world countries have strong enough trading relations to make the GDP evolve the same way. The fact that we have found some main clusters could be interpreted as regions in which the trading relations are more important.

2.D. Economic classification of countries

Below, we plot the distribution of GDP over the world for the last decades:

In [61]:
plot = uni_df[['Area','Year']]
plot["Value"] = uni_df["(GDP, million $)"]
for year in range(1980, 2015, 5):
    display(year, visualise_world_data_folium(plot, year, True))
1980
1985
1990
1995
2000
2005
2010

The countries with high GDP indeed correpond to the most developped countries. The trends we can observe from those plots look very significant (USSR dissolution, China economic growth, ...)

3.A. Production and consumption of food from a time-space geography point of view

For the next milestone, this part will be put to the end of the file and conducted for specific agricultural features (e.g. trade of crop most highly correlated to GDP) after the most important ones have been identified.

3.B. Food self-sufficiency

We also plan for next milestone to analyse in more details the correlation between food self-sufficiency and economic success.

In [62]:
uni_df[('food self-sufficiency','%')].corr(method='pearson', other =  uni_df['(Consumer price indices, %)'])
Out[62]:
0.036316065740648606
In [63]:
uni_df[('food self-sufficiency','%')].corr(method='pearson', other =  uni_df["(GDP, million $)"])
Out[63]:
0.08978351936674142

3.C. Analysis of food production and trade with GDP

3.C.a. Food production and GDP

Out of the crops and the livestock produced, we will focus on the top 20 crops as well as top 20 livestock produced globally to be able to have a model with a reasonable amount of features.

In [64]:
#Choose the top 20 crops produced in the world for the first model
crop_production_df= uni_df.filter(like='Crops Production tonnes') 
crop_production_df = crop_production_df.sum(axis=0).sort_values(ascending=False)
crop_production_list = crop_production_df.index.tolist()[:20]
crop_production_list.insert(0,'(GDP, million $)')
crop_production_list
#Take the top 20 crops and the columns of our uni_df which contain their production
maincrops_df = uni_df[crop_production_list]

#Livestock
livestock_production_df= uni_df.filter(like='Livestock production') 
livestock_production_df = livestock_production_df.sum(axis=0).sort_values(ascending=False)
livestock_production_list = livestock_production_df.index.tolist()[:20]
livestock_production_list.insert(0,'(GDP, million $)')
livestock_production_list
#Take the top 20 crops and the columns of our uni_df which contain their production
mainlivestock_df = uni_df[livestock_production_list]

top_production_list = crop_production_list +livestock_production_list[1:]

top_production_df = uni_df[top_production_list]
top_production_df.head(5)
Out[64]:
(GDP, million $) Cereals (Rice Milled Eqv) Crops Production tonnes Sugar cane Crops Production tonnes Vegetables&Melons, Total Crops Production tonnes Roots and Tubers,Total Crops Production tonnes Maize Crops Production tonnes Wheat Crops Production tonnes Rice, paddy Crops Production tonnes Fruit excl Melons,Total Crops Production tonnes Potatoes Crops Production tonnes ... Camels Livestock production Head Poultry Birds Livestock production 1000 Head Mules Livestock production Head Chickens Livestock production 1000 Head Camelids, other Livestock production Head Animals live nes Livestock production Head Ducks Livestock production 1000 Head Rabbits and hares Livestock production 1000 Head Turkeys Livestock production 1000 Head Geese and guinea fowls Livestock production 1000 Head
0 1748.886597 3390122.0 55000.0 482000.0 144000.0 667000.0 2081000.0 366000.0 794600.0 144000.0 ... 300000.0 6000.0 35000.0 6000.0 0.0 0.0 0.0 0.0 0.0 0.0
1 1831.108982 3202450.0 50000.0 558000.0 167000.0 670000.0 1915000.0 350000.0 634800.0 167000.0 ... 300000.0 6000.0 35000.0 6000.0 0.0 0.0 0.0 0.0 0.0 0.0
2 1595.555482 3816800.0 35000.0 500000.0 158000.0 720000.0 2450000.0 400000.0 781400.0 158000.0 ... 300000.0 6000.0 31000.0 6000.0 0.0 0.0 0.0 0.0 0.0 0.0
3 1733.333275 4130140.0 58000.0 515000.0 165000.0 760000.0 2700000.0 420000.0 822500.0 165000.0 ... 300000.0 6100.0 26500.0 6100.0 0.0 0.0 0.0 0.0 0.0 0.0
4 2155.555500 4211140.0 51600.0 518000.0 182000.0 770000.0 2750000.0 420000.0 847500.0 182000.0 ... 300000.0 6100.0 26500.0 6100.0 0.0 0.0 0.0 0.0 0.0 0.0

5 rows × 41 columns

In [65]:
#Checking correlations of main crops between each other and with GDP
top_production_correlation_matrix = round (top_production_df.corr(method='pearson'),3)
top_production_correlation_matrix['(GDP, million $)'].sort_values(ascending = False)
Out[65]:
(GDP, million $)                                         1.000
Maize Crops Production tonnes                            0.779
Turkeys Livestock production 1000 Head                   0.705
Soybeans Crops Production tonnes                         0.658
Cereals (Rice Milled Eqv) Crops Production tonnes        0.654
Chickens Livestock production 1000 Head                  0.578
Poultry Birds Livestock production 1000 Head             0.568
Tomatoes Crops Production tonnes                         0.550
Citrus Fruit,Total Crops Production tonnes               0.535
Wheat Crops Production tonnes                            0.508
Fruit excl Melons,Total Crops Production tonnes          0.494
Grapes Crops Production tonnes                           0.452
Horses Livestock production Head                         0.432
Sorghum Crops Production tonnes                          0.410
Sugar beet Crops Production tonnes                       0.395
Vegetables&Melons, Total Crops Production tonnes         0.393
Potatoes Crops Production tonnes                         0.387
Pigs Livestock production Head                           0.367
Cattle Livestock production Head                         0.360
Cattle and Buffaloes Livestock production Head           0.317
Beehives Livestock production No Livestock production    0.294
Roots and Tubers,Total Crops Production tonnes           0.290
Geese and guinea fowls Livestock production 1000 Head    0.282
Ducks Livestock production 1000 Head                     0.278
Barley Crops Production tonnes                           0.250
Rabbits and hares Livestock production 1000 Head         0.245
Sheep and Goats Livestock production Head                0.241
Rice, paddy Crops Production tonnes                      0.240
Sheep Livestock production Head                          0.219
Goats Livestock production Head                          0.215
Sugar cane Crops Production tonnes                       0.167
Bananas Crops Production tonnes                          0.155
Sweet potatoes Crops Production tonnes                   0.155
Mules Livestock production Head                          0.132
Asses Livestock production Head                          0.118
Buffaloes Livestock production Head                      0.087
Animals live nes Livestock production Head               0.049
Cassava Crops Production tonnes                          0.034
Oil, palm fruit Crops Production tonnes                  0.015
Camelids, other Livestock production Head               -0.013
Camels Livestock production Head                        -0.017
Name: (GDP, million $), dtype: float64

Because "Cereals (Rice Milled Eqv) Crops Production tonnes" has such a high correlation with a lot of other features, it is probably an aggregate of them (eg over 90% with wheat).

We would know like to look at some relationships between these measures and the GDP:

In [66]:
#Looking at some relationships
for item in list(top_production_df.columns)[1:]:
    top_production_df.plot(kind='scatter', x=item, y='(GDP, million $)', grid=True)

We can quite clearly see that the production of many of the most popular crops can well be related to GDP.

3.C.b. Trade and GDP

Let's see if the same can be said about the most exported/imported goods.

In [67]:
#Choose the top 20 most exported items by mass
top_exports_df= uni_df.filter(like='export quantities tonnes')
top_exports_df = top_exports_df.sum(axis=0).sort_values(ascending=False)
top_exports_list =top_exports_df.index.tolist()[:20]
top_exports_list

#Take the top 20 exported crops and the columns of our uni_df which contain their production
top_exports_list.insert(0,'(GDP, million $)')
top_exports_df = uni_df[top_exports_list]


#Choose the top 20 most imported items by mass
top_imports_df= uni_df.filter(like='import quantities tonnes')
top_imports_df = top_imports_df.sum(axis=0).sort_values(ascending=False)

top_imports_list =top_imports_df.index.tolist()[:20]
top_imports_list

#Take the top 20 imported items and the columns of our uni_df which contain their production
top_imports_list.insert(0,'(GDP, million $)')
top_imports_df = uni_df[top_imports_list]

#Create a joined list
top_traded_list = top_exports_list +top_imports_list[1:]
In [68]:
top_traded_df = uni_df[top_traded_list]
top_traded_df.sample(5)
Out[68]:
(GDP, million $) Cereals Food export quantities tonnes Wheat+Flour,Wheat Equivalent Food export quantities tonnes Wheat Food export quantities tonnes Maize Food export quantities tonnes Fodder & Feeding stuff Food export quantities tonnes Milk Equivalent Food export quantities tonnes Oilseeds Food export quantities tonnes Sugar and Honey Food export quantities tonnes Oilseed Cake Meal Food export quantities tonnes ... Animal Vegetable Oil Food import quantities tonnes Sugar,Total (Raw Equiv.) Food import quantities tonnes Cake, soybeans Food import quantities tonnes Beverages Food import quantities tonnes Fixed Vegetable Oils Food import quantities tonnes Sugar Raw Centrifugal Food import quantities tonnes Rice Food import quantities tonnes Rice - total (Rice milled equivalent) Food import quantities tonnes Cassava Equivalent Food import quantities tonnes Barley Food import quantities tonnes
2529 847.918929 0.0 0.0 0.0 0.0 0.0 684.0 768.0 1500.0 0.0 ... 91365.0 161300.0 0.0 6633.0 90869.0 24863.0 61500.0 61500.0 0.0 14.0
5600 4766.919343 0.0 0.0 0.0 0.0 20434.0 0.0 10283.0 14766.0 20434.0 ... 91390.0 185334.0 4200.0 16500.0 71790.0 0.0 32290.0 32290.0 0.0 0.0
3411 111.033263 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 91.0 3628.0 0.0 790.0 91.0 2900.0 1301.0 1301.0 0.0 0.0
1839 8194.991104 11.0 11.0 0.0 0.0 689.0 276.0 4.0 85.0 0.0 ... 7611.0 6072.0 0.0 103575.0 7594.0 938.0 26108.0 26108.0 158.0 30.0
3568 408.246508 0.0 0.0 0.0 0.0 0.0 0.0 1693.0 0.0 0.0 ... 4978.0 3833.0 0.0 11957.0 4978.0 0.0 77650.0 77650.0 0.0 0.0

5 rows × 41 columns

We now check the correlations of main traded goods with GDP:

In [69]:
top_traded_correlation_matrix = round (top_traded_df.corr(method='pearson'),3)
top_traded_correlation_matrix['(GDP, million $)'].sort_values(ascending = False)
Out[69]:
(GDP, million $)                                                        1.000
Oilseeds Food export quantities tonnes                                  0.634
Soybeans Food export quantities tonnes                                  0.615
Maize Food export quantities tonnes                                     0.612
Cereals Food export quantities tonnes                                   0.608
Fodder & Feeding stuff Food export quantities tonnes                    0.557
Sugar and Honey Food import quantities tonnes                           0.539
Wheat+Flour,Wheat Equivalent Food export quantities tonnes              0.533
Wheat Food export quantities tonnes                                     0.530
Animal Vegetable Oil Food import quantities tonnes                      0.502
Fixed Vegetable Oils Food import quantities tonnes                      0.494
Beverages Food import quantities tonnes                                 0.463
Sugar,Total (Raw Equiv.) Food import quantities tonnes                  0.422
Sugar Raw Centrifugal Food import quantities tonnes                     0.410
Fodder & Feeding stuff Food import quantities tonnes                    0.403
Oilseeds Food import quantities tonnes                                  0.402
Milk Equivalent Food import quantities tonnes                           0.368
Soybeans Food import quantities tonnes                                  0.360
Cereals Food import quantities tonnes                                   0.343
Oilseed Cake Meal Food import quantities tonnes                         0.331
Maize Food import quantities tonnes                                     0.320
Milk Equivalent Food export quantities tonnes                           0.320
Beverages Food export quantities tonnes                                 0.319
Oilseed Cake Meal Food export quantities tonnes                         0.296
Cassava Equivalent Food import quantities tonnes                        0.294
Cake, soybeans Food export quantities tonnes                            0.278
Rice Food export quantities tonnes                                      0.256
Rice - total  (Rice milled equivalent) Food export quantities tonnes    0.256
Cake, soybeans Food import quantities tonnes                            0.252
Wheat Food import quantities tonnes                                     0.245
Wheat+Flour,Wheat Equivalent Food import quantities tonnes              0.240
Barley Food import quantities tonnes                                    0.231
Rice Food import quantities tonnes                                      0.222
Rice - total  (Rice milled equivalent) Food import quantities tonnes    0.222
Barley Food export quantities tonnes                                    0.220
Animal Vegetable Oil Food export quantities tonnes                      0.196
Sugar and Honey Food export quantities tonnes                           0.184
Fixed Vegetable Oils Food export quantities tonnes                      0.137
Sugar,Total (Raw Equiv.) Food export quantities tonnes                  0.109
Sugar Raw Centrifugal Food export quantities tonnes                     0.060
Cassava Equivalent Food export quantities tonnes                        0.001
Name: (GDP, million $), dtype: float64

It seems that the highest correlations can be found from exports of meat as well as feeding stuff.

We would know like to look at some relationships between these measures and the GDP:

In [70]:
for item in list(top_traded_df.columns)[1:]:
    top_traded_df.plot(kind='scatter', x=item, y='(GDP, million $)', grid=True)
3.C.c. Machine learning model
        1. Model - based on trade

        2. Model - based on production

3.C.c.i. Model - based on production

First, we only create a model using only the production data. Afterwards, we use the data on exports, imports.

Standardization, scaling

In [71]:
_ = sns.distplot(top_production_df["Maize Crops Production tonnes"], rug=False, hist=False)
In [72]:
_ = sns.distplot(np.log10(top_production_df["Maize Crops Production tonnes"]), rug=False, hist=False)

We notice that the production of goods, just as the GDP, has a more "normal looking" distribution when we use the logscale. We will thus create a new uniformed dataframe, with applying the logarithm to all values.

In [73]:
#Using 1+x so as to keep 0 values to 0.
def log(x):
    return np.log10(1+x)
In [74]:
#create a new dataframe with log values, so that we have normal distributions for later analysis
uni_df_log = uni_df.copy()
uni_df_log.iloc[:,2:] = uni_df_log.iloc[:,2:].apply(lambda x : log(x))
In [75]:
top_production_log_df = uni_df_log[top_production_list]
top_production_log_df.sample(5)
Out[75]:
(GDP, million $) Cereals (Rice Milled Eqv) Crops Production tonnes Sugar cane Crops Production tonnes Vegetables&Melons, Total Crops Production tonnes Roots and Tubers,Total Crops Production tonnes Maize Crops Production tonnes Wheat Crops Production tonnes Rice, paddy Crops Production tonnes Fruit excl Melons,Total Crops Production tonnes Potatoes Crops Production tonnes ... Camels Livestock production Head Poultry Birds Livestock production 1000 Head Mules Livestock production Head Chickens Livestock production 1000 Head Camelids, other Livestock production Head Animals live nes Livestock production Head Ducks Livestock production 1000 Head Rabbits and hares Livestock production 1000 Head Turkeys Livestock production 1000 Head Geese and guinea fowls Livestock production 1000 Head
4861 3.333928 6.145596 6.138281 5.477241 6.336978 5.059359 2.624282 6.283754 5.861066 5.257609 ... 0.000000 4.306446 0.00000 4.146159 0.0 0.0 3.415140 0.0 3.130655 3.361917
3864 4.639486 7.258011 7.147196 6.328551 7.175198 6.410295 0.000000 7.367374 6.472062 5.187524 ... 0.000000 5.070278 0.00000 5.010228 0.0 0.0 4.181358 0.0 0.000000 0.000000
7176 3.330483 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 ... 0.000000 0.000000 0.00000 0.000000 0.0 0.0 0.000000 0.0 0.000000 0.000000
4415 4.190429 2.120574 0.000000 4.463938 2.158362 1.000000 0.000000 0.000000 3.023664 2.158362 ... 3.699057 3.752970 0.00000 3.752970 0.0 0.0 0.000000 0.0 0.000000 0.000000
3700 3.687434 5.887122 6.553907 5.592619 4.489044 5.818073 2.975432 4.603523 6.167309 4.280738 ... 0.000000 4.133571 4.84074 4.133571 0.0 0.0 0.000000 0.0 0.000000 0.000000

5 rows × 41 columns

In [76]:
#We then normalize the data, so as to have comparable ranges. We use the dataframe of log values.

top_production_values = top_production_log_df.values
standard_scaler = preprocessing.StandardScaler()
top_production_stand_values =standard_scaler.fit_transform(top_production_values)
top_production_stand = pd.DataFrame(top_production_stand_values, columns=top_production_log_df.columns)
top_production_stand.sample(5)
Out[76]:
(GDP, million $) Cereals (Rice Milled Eqv) Crops Production tonnes Sugar cane Crops Production tonnes Vegetables&Melons, Total Crops Production tonnes Roots and Tubers,Total Crops Production tonnes Maize Crops Production tonnes Wheat Crops Production tonnes Rice, paddy Crops Production tonnes Fruit excl Melons,Total Crops Production tonnes Potatoes Crops Production tonnes ... Camels Livestock production Head Poultry Birds Livestock production 1000 Head Mules Livestock production Head Chickens Livestock production 1000 Head Camelids, other Livestock production Head Animals live nes Livestock production Head Ducks Livestock production 1000 Head Rabbits and hares Livestock production 1000 Head Turkeys Livestock production 1000 Head Geese and guinea fowls Livestock production 1000 Head
5798 0.415761 -1.730871 -0.900328 -2.255956 -2.155906 -1.410905 -0.986939 -1.019020 -2.345133 -1.297216 ... -0.481401 -2.155140 -0.726811 -2.157192 -0.099386 -0.164975 -0.700909 -0.511421 -0.620294 -0.464012
3089 -0.019914 -1.730871 0.236408 -0.405140 -0.299183 -1.410905 -0.986939 -1.019020 -0.290337 -0.227739 ... -0.481401 -0.595311 -0.726811 -0.617247 -0.099386 -0.164975 0.346689 -0.511421 -0.620294 -0.464012
6928 -0.487272 0.301741 0.512634 0.234194 0.734803 0.534578 0.448816 0.519333 0.896865 0.808732 ... -0.481401 -0.163800 -0.726811 -0.156207 -0.099386 -0.164975 0.147958 1.489665 -0.620294 0.380843
3795 0.097356 -1.730871 -0.900328 -0.556347 -0.326290 -1.410905 -0.986939 -1.019020 -1.736819 0.206097 ... -0.481401 -0.719862 -0.726811 -0.711979 -0.099386 -0.164975 -0.700909 1.804284 -0.620294 -0.464012
631 0.021576 -1.730871 -0.900328 -0.271380 -1.586678 -1.410905 -0.986939 -1.019020 -0.202916 -0.829507 ... 1.008024 -0.366328 -0.726811 -0.355998 -0.099386 -0.164975 -0.700909 -0.511421 -0.620294 -0.464012

5 rows × 41 columns

In [77]:
train_set, test_set = train_test_split(top_production_stand.values, test_size = 0.2, random_state = 1)

X_train_set = train_set[:,1:]
Y_train_set = train_set[:,0]

X_test_set = test_set[:,1:]
Y_test_set = test_set[:,0]
In [78]:
number_of_folds = 5
scores = []
list_of_alpha = [i for i in np.arange(0,10,0.01)]
for alpha in tqdm(list_of_alpha):
    clf = Ridge(alpha = alpha)
    score = cross_val_score(clf, X_train_set, Y_train_set, cv=number_of_folds, scoring = 'neg_mean_squared_error')
    scores.append([alpha, score.mean()])
a=np.array(scores)
best_alpha = a[np.where(a==np.amax(a[:,1]))[0]][0,0]
print("The best value obtained is for alpha equal to " + str(best_alpha) + " with a MSE of "+ str(-a[np.where(a==np.amax(a[:,1]))[0]][0,1]))
alphas = [elt[0] for elt in scores]
MSE = [-elt[1] for elt in scores]
sns.lineplot(alphas, MSE)
_ = plt.title("Cross validation score")
_ = plt.ylabel("Mean Squared Error")
_ = plt.xlabel("alphas")
100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:45<00:00, 21.83it/s]
The best value obtained is for alpha equal to 0.06 with a MSE of 0.32763200811215537
In [79]:
# We build our model with the chosen alpha.
model_top_production = Ridge(alpha=best_alpha)
model_top_production.fit(X_train_set, Y_train_set)
Out[79]:
Ridge(alpha=0.06, copy_X=True, fit_intercept=True, max_iter=None,
      normalize=False, random_state=None, solver='auto', tol=0.001)
In [80]:
weights_top_production = pd.DataFrame([model_top_production.coef_], columns=top_production_df.columns[1:])
weights_top_production = weights_top_production.sort_values(by=0, axis=1, ascending=False)                                  
weights_top_production
Out[80]:
Poultry Birds Livestock production 1000 Head Cattle and Buffaloes Livestock production Head Potatoes Crops Production tonnes Tomatoes Crops Production tonnes Sheep Livestock production Head Sugar beet Crops Production tonnes Turkeys Livestock production 1000 Head Oil, palm fruit Crops Production tonnes Wheat Crops Production tonnes Horses Livestock production Head ... Vegetables&Melons, Total Crops Production tonnes Rice, paddy Crops Production tonnes Geese and guinea fowls Livestock production 1000 Head Pigs Livestock production Head Asses Livestock production Head Sheep and Goats Livestock production Head Fruit excl Melons,Total Crops Production tonnes Roots and Tubers,Total Crops Production tonnes Cattle Livestock production Head Chickens Livestock production 1000 Head
0 1.486112 0.6384 0.22552 0.187854 0.145461 0.13441 0.114564 0.109679 0.103595 0.094869 ... -0.053963 -0.085615 -0.098977 -0.117822 -0.179584 -0.255719 -0.328864 -0.346439 -0.54638 -0.769241

1 rows × 40 columns

In [81]:
model_top_production.score(X_train_set, Y_train_set)
Out[81]:
0.6696735614504745

Results are so far middling, but it also need cleaning since there's some aggregate values (e.g. including "total" in their name, that need to be removed). We will continue working on this for next milestone.

3.C.c.ii. Model - based on trade
In [82]:
#Use dataframe with log values
top_traded_log_df = uni_df_log[top_traded_list]
In [83]:
#We then normalize the data, so as to have comparable ranges. We use the dataframe of log values.

top_traded_values = top_traded_log_df.values
standard_scaler = preprocessing.StandardScaler()
top_traded_stand_values =standard_scaler.fit_transform(top_traded_values)
top_traded_stand = pd.DataFrame(top_production_stand_values, columns=top_traded_log_df.columns)
top_traded_stand.sample(5)
Out[83]:
(GDP, million $) Cereals Food export quantities tonnes Wheat+Flour,Wheat Equivalent Food export quantities tonnes Wheat Food export quantities tonnes Maize Food export quantities tonnes Fodder & Feeding stuff Food export quantities tonnes Milk Equivalent Food export quantities tonnes Oilseeds Food export quantities tonnes Sugar and Honey Food export quantities tonnes Oilseed Cake Meal Food export quantities tonnes ... Animal Vegetable Oil Food import quantities tonnes Sugar,Total (Raw Equiv.) Food import quantities tonnes Cake, soybeans Food import quantities tonnes Beverages Food import quantities tonnes Fixed Vegetable Oils Food import quantities tonnes Sugar Raw Centrifugal Food import quantities tonnes Rice Food import quantities tonnes Rice - total (Rice milled equivalent) Food import quantities tonnes Cassava Equivalent Food import quantities tonnes Barley Food import quantities tonnes
5993 -0.249097 0.381123 1.202879 -0.019528 0.133570 0.621130 -0.986939 0.894572 0.335640 0.299069 ... -0.481401 0.180784 1.553474 0.194901 -0.099386 -0.164975 -0.700909 -0.511421 -0.620294 -0.464012
2191 0.515706 0.756432 -0.900328 0.499506 0.558272 1.083005 1.112150 -1.019020 0.476759 0.932901 ... -0.481401 0.400615 -0.726811 0.393613 -0.099386 -0.164975 0.888029 -0.511421 1.500075 1.386713
6777 1.015646 0.840001 -0.900328 1.111315 0.681882 0.584699 0.421340 1.502388 0.710340 0.892142 ... -0.481401 0.808166 0.341206 0.822901 -0.099386 6.391469 1.257568 1.753395 -0.093537 0.234757
7484 -0.864762 -1.730871 -0.900328 -2.255956 -2.155906 -1.410905 -0.986939 -1.019020 -2.345133 -1.297216 ... -0.481401 -2.155140 -0.726811 -2.157192 -0.099386 -0.164975 -0.700909 -0.511421 -0.620294 -0.464012
746 0.298567 0.833903 -0.900328 0.664073 1.043551 -0.241395 0.890469 -1.019020 0.421463 1.331633 ... -0.481401 0.686845 -0.726811 0.693486 -0.099386 -0.164975 -0.700909 1.315612 1.726685 -0.464012

5 rows × 41 columns

In [84]:
train_set, test_set = train_test_split(top_traded_stand.values, test_size = 0.2, random_state = 1)

X_train_set = train_set[:,1:]
Y_train_set = train_set[:,0]

X_test_set = test_set[:,1:]
Y_test_set = test_set[:,0]
In [85]:
number_of_folds = 5
scores = []
list_of_alpha = [i for i in np.arange(0,10,0.01)]
for alpha in tqdm(list_of_alpha):
    clf = Ridge(alpha = alpha)
    score = cross_val_score(clf, X_train_set, Y_train_set, cv=number_of_folds, scoring = 'neg_mean_squared_error')
    scores.append([alpha, score.mean()])
a=np.array(scores)
best_alpha = a[np.where(a==np.amax(a[:,1]))[0]][0,0]
print("The best value obtained is for alpha equal to " + str(best_alpha) + " with a MSE of "+ str(-a[np.where(a==np.amax(a[:,1]))[0]][0,1]))
alphas = [elt[0] for elt in scores]
MSE = [-elt[1] for elt in scores]
sns.lineplot(alphas, MSE)
_ = plt.title("Cross validation score")
_ = plt.ylabel("Mean Squared Error")
_ = plt.xlabel("alphas")
100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:44<00:00, 22.65it/s]
The best value obtained is for alpha equal to 0.06 with a MSE of 0.32763200811215537
In [86]:
# We build our model with the chosen alpha.
model_trade = Ridge(alpha=best_alpha)
model_trade.fit(X_train_set, Y_train_set)
Out[86]:
Ridge(alpha=0.06, copy_X=True, fit_intercept=True, max_iter=None,
      normalize=False, random_state=None, solver='auto', tol=0.001)
In [87]:
weights_trade = pd.DataFrame([model_trade.coef_], columns=top_traded_stand.columns[1:])
weights_trade = weights_trade.sort_values(by=0, axis=1, ascending=False)                                  
weights_trade
Out[87]:
Sugar,Total (Raw Equiv.) Food import quantities tonnes Wheat+Flour,Wheat Equivalent Food import quantities tonnes Oilseed Cake Meal Food export quantities tonnes Sugar Raw Centrifugal Food export quantities tonnes Maize Food import quantities tonnes Soybeans Food export quantities tonnes Cassava Equivalent Food import quantities tonnes Fixed Vegetable Oils Food export quantities tonnes Milk Equivalent Food export quantities tonnes Soybeans Food import quantities tonnes ... Wheat Food export quantities tonnes Oilseeds Food export quantities tonnes Barley Food import quantities tonnes Fodder & Feeding stuff Food import quantities tonnes Oilseed Cake Meal Food import quantities tonnes Cereals Food import quantities tonnes Sugar and Honey Food export quantities tonnes Maize Food export quantities tonnes Wheat Food import quantities tonnes Beverages Food import quantities tonnes
0 1.486112 0.6384 0.22552 0.187854 0.145461 0.13441 0.114564 0.109679 0.103595 0.094869 ... -0.053963 -0.085615 -0.098977 -0.117822 -0.179584 -0.255719 -0.328864 -0.346439 -0.54638 -0.769241

1 rows × 40 columns

In [88]:
model_trade.score(X_train_set, Y_train_set)
Out[88]:
0.6696735614504745

This models also needs to be cleaned, because the one with lowest weights is actually a part of the one with highest.

But it seems we can already draw some conclusions: cake, soybeans, as well as oilseed cake meal food and fodder & Feeding stuff are all used as animal feed. The fact that they're among the features with the highest weights clearly indicates that there's a link between GDP and high amounts of imports of animal feed.

3.D. Food prices stability and agricultural features

In this part, we plan to draw for next milestone a similar analysis with CPI as with GDP in previous section.

4. Informed plan for next actions

With clean dataframes we can now really focus on producing some interesting results. Our initial idea was to observe the effects of different crops and food items on the economic growth of the countries as well as the differences in self-sufficiency. Along the exploration of the data, we turned our attention toward prediction models. Training a Ridge model on our data will allow us to identify the agricultural products that are the most correlated with the economic growth of countries (predict the GDP based on agricultural features). The second model (prediction of the CPI variation rate) will allow us to identify products linked with economic stability.

The identification of such items would give an interesting insight toward understanding geopolotical strategies and challenges. Further insight might be gained by identifying who the producers of these "economically strong" crops and animal products are and visualising the geographical repartition of the most important ressources.

Our objectives for the following weeks are:

  1. Revise the order of analysis - for instance putting net food exporters and importers at the end of the notebook.
  2. Fine tune our machine learning models by finding satisfying and meaningful ways to normalise the data without introducing bias, by doing a meaningful parameter optimisation and by removing aggregation variables such as "Rice-total".
  3. Identify the "strong" agricultural products.
  4. Find a clean way to calculate price stability with the given CPI values.
  5. Create a model for price stability and correlating it to different agricultural products.
  6. Identify the products we want to use for the self-sufficiency calculation.
  7. Compute the correlation between price stability and self-sufficiency.
  8. Visualise the repartition of the production of "strong" products as well as their trading flow (who buys, who sells).
  9. Share our results in the form of a data story.